Re: TrieDateField, precisionStep impact on sorting performance
Thanks for clarifying! Dennis On 7/16/14 3:19 PM, "Yonik Seeley" wrote: >On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis > wrote: >> I'd like to sort on a TrieDateField which currently has a precisionStep >>value of 6. >> From what I got so far, the precisionStep value only affects range >>query performance and index size. >> >> However, the documentation for TrieDateField says: >> 'precisionStep="0" enables efficient date sorting and minimizes index >>size; precisionStep="8" (the default) enables efficient range queries.' >> >> Does this mean sorting performance will suffer for precisionStep values >>other than 0? > >No, sorting speed is unaffected by precisionStep. That comment looks >slightly misleading. > >-Yonik >http://heliosearch.org - native code faceting, facet functions, >sub-facets, off-heap data
Re: TrieDateField, precisionStep impact on sorting performance
On Wed, Jul 16, 2014 at 5:51 AM, Kuehn, Dennis wrote: > I'd like to sort on a TrieDateField which currently has a precisionStep value > of 6. > From what I got so far, the precisionStep value only affects range query > performance and index size. > > However, the documentation for TrieDateField says: > 'precisionStep="0" enables efficient date sorting and minimizes index size; > precisionStep="8" (the default) enables efficient range queries.' > > Does this mean sorting performance will suffer for precisionStep values other > than 0? No, sorting speed is unaffected by precisionStep. That comment looks slightly misleading. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
TrieDateField, precisionStep impact on sorting performance
Hello, I'd like to sort on a TrieDateField which currently has a precisionStep value of 6. >From what I got so far, the precisionStep value only affects range query >performance and index size. However, the documentation for TrieDateField says: 'precisionStep="0" enables efficient date sorting and minimizes index size; precisionStep="8" (the default) enables efficient range queries.' Does this mean sorting performance will suffer for precisionStep values other than 0? Cheers, Dennis
Re: Sorting performance
Hi, probably this may help you start: https://issues.apache.org/jira/browse/SOLR-1297 Dmitry On Mon, Jun 4, 2012 at 9:51 PM, Gau wrote: > Here is the usecase: > I am using synonym expansion at query time to get results. this is > essentially a name search, so a search for Jim may be expanded at query > time > for James, Jung, Jimmy, etc. > > So ranking fields like TF, IDF, Norms do not mean anything to me. I just > reset them to zero. so all the results which I get have the same rank. I > have used a copy field to boost the weights of exact match, so Jim would be > boosted to the top. > > However I want the other results like Jimmy, Jung, James to be sorted by > Levenstein Distance with respect to word Jim (the original query). The > number of results returned are quite large. So a genereal strdist sort > takes > 6-7 seconds. Is there any other option than applying a sort= in the query > to > achieve the same functionality? Any particular way to index the data to > achieve the same result? any idea to boost the performance and get the > intended functionality? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987633.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Dmitry Kan
Sorting performance
Here is the usecase: I am using synonym expansion at query time to get results. this is essentially a name search, so a search for Jim may be expanded at query time for James, Jung, Jimmy, etc. So ranking fields like TF, IDF, Norms do not mean anything to me. I just reset them to zero. so all the results which I get have the same rank. I have used a copy field to boost the weights of exact match, so Jim would be boosted to the top. However I want the other results like Jimmy, Jung, James to be sorted by Levenstein Distance with respect to word Jim (the original query). The number of results returned are quite large. So a genereal strdist sort takes 6-7 seconds. Is there any other option than applying a sort= in the query to achieve the same functionality? Any particular way to index the data to achieve the same result? any idea to boost the performance and get the intended functionality? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987633.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting performance
Here is the usecase: I am using synonym expansion at query time to get results. this is essentially a name search, so a search for Jim may be expanded at query time for James, Jung, Jimmy, etc. So ranking fields like TF, IDF, Norms do not mean anything to me. I just reset them to zero. so all the results which I get have the same rank. I have used a copy field to boost the weights of exact match, so Jim would be boosted to the top. However I want the other results like Jimmy, Jung, James to be sorted by Levenstein Distance with respect to word Jim (the original query). The number of results returned are quite large. So a genereal strdist sort takes 6-7 seconds. Is there any other option than applying a sort= in the query to achieve the same functionality? Any particular way to index the data to achieve the same result? any idea to boost the performance and get the intended functionality? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987632.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting performance + replication of index between cores
Did u guys find a solution? I am having a similar issue. Setup: One indexer box & 2 searcher box. Each having 6 different solr-cores We have a lot of updates (in the range of a couple thousand items every few mins). The Snappuller/Snapinstaller pulls and commits every 5 mins. Query response time peaks to 60+ seconds when a new searcher is being prepared. I have disabled the caches (filter, query & document). We have a strict requirement of response time < 10 secs all the time. Thanks Sreeram sunnyfr wrote: > > Hi Christophe, > > Did you find a way to fix up your problem, cuz even with replication will > have this problem, lot of update means clear cache and manage that. > I've the same issue, I just wondering if I won't turn off servers during > update ??? > How did you fix that ? > > Thanks, > sunny > > > christophe-2 wrote: >> >> Hi, >> >> After fully reloading my index, using another field than a Data does not >> help that much. >> Using a warmup query avoids having the first request slow, but: >> - Frequents commits means that the Searcher is reloaded frequently >> and, as the warmup takes time, the clients must wait. >> - Having warmup slows down the index process (I guess this is >> because after a commit, the Searchers are recreated) >> >> So I'm considering, as suggested, to have two instances: one for >> indexing and one for searching. >> I was wondering if there are simple ways to replicate the index in a >> single Solr server running two cores ? Any such config already tested ? >> I guess that the standard replication based on rsync can be simplified a >> lot in this case as the two indexes are on the same server. >> >> Thanks >> Christophe >> >> Beniamin Janicki wrote: >>> :so you can send your updates anytime you want, and as long as you only >>> :commit every 5 minutes (or commit on a master as often as you want, but >>> :only run snappuller/snapinstaller on your slaves every 5 minutes) your >>> :results will be at most 5minutes + warming time stale. >>> >>> This is what I do as well ( commits are done once per 5 minutes ). I've >>> got >>> master - slave configuration. Master has turned off all caches >>> (commented in >>> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB >>> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration >>> with warming it took from 30 mins up to 2 hours). >>> >>> Slave caches are configured to have autowarmCount="0" and >>> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is >>> done. I haven't noticed any huge delays while serving search request. >>> Try to use those values - may be they'll help in your case too. >>> >>> Ben Janicki >>> >>> >>> -Original Message- >>> From: Chris Hostetter [mailto:hossman_luc...@fucit.org] >>> Sent: 22 October 2008 04:56 >>> To: solr-user@lucene.apache.org >>> Subject: Re: Sorting performance >>> >>> >>> : The problem is that I will have hundreds of users doing queries, and a >>> : continuous flow of document coming in. >>> : So a delay in warming up a cache "could" be acceptable if I do it a >>> few >>> times >>> : per day. But not on a too regular basis (right now, the first query >>> that >>> loads >>> : the cache takes 150s). >>> : >>> : However: I'm not sure why it looks not to be a good idea to update the >>> caches >>> >>> you can refresh the caches automaticly after updating, the "newSearcher" >>> event is fired whenever a searcher is opened (but before it's used by >>> clients) so you can configure warming queries for it -- it doesn't have >>> to >>> be done manually (or by the first user to use that reader) >>> >>> so you can send your updates anytime you want, and as long as you only >>> commit every 5 minutes (or commit on a master as often as you want, but >>> only run snappuller/snapinstaller on your slaves every 5 minutes) your >>> results will be at most 5minutes + warming time stale. >>> >>> >>> -Hoss >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p25286018.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting performance + replication of index between cores
Hi Christophe, Did you find a way to fix up your problem, cuz even with replication will have this problem, lot of update means clear cache and manage that. I've the same issue, I just wondering if I won't turn off servers during update ??? How did you fix that ? Thanks, sunny christophe-2 wrote: > > Hi, > > After fully reloading my index, using another field than a Data does not > help that much. > Using a warmup query avoids having the first request slow, but: > - Frequents commits means that the Searcher is reloaded frequently > and, as the warmup takes time, the clients must wait. > - Having warmup slows down the index process (I guess this is > because after a commit, the Searchers are recreated) > > So I'm considering, as suggested, to have two instances: one for > indexing and one for searching. > I was wondering if there are simple ways to replicate the index in a > single Solr server running two cores ? Any such config already tested ? > I guess that the standard replication based on rsync can be simplified a > lot in this case as the two indexes are on the same server. > > Thanks > Christophe > > Beniamin Janicki wrote: >> :so you can send your updates anytime you want, and as long as you only >> :commit every 5 minutes (or commit on a master as often as you want, but >> :only run snappuller/snapinstaller on your slaves every 5 minutes) your >> :results will be at most 5minutes + warming time stale. >> >> This is what I do as well ( commits are done once per 5 minutes ). I've >> got >> master - slave configuration. Master has turned off all caches (commented >> in >> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB >> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration >> with warming it took from 30 mins up to 2 hours). >> >> Slave caches are configured to have autowarmCount="0" and >> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is >> done. I haven't noticed any huge delays while serving search request. >> Try to use those values - may be they'll help in your case too. >> >> Ben Janicki >> >> >> -Original Message- >> From: Chris Hostetter [mailto:hossman_luc...@fucit.org] >> Sent: 22 October 2008 04:56 >> To: solr-user@lucene.apache.org >> Subject: Re: Sorting performance >> >> >> : The problem is that I will have hundreds of users doing queries, and a >> : continuous flow of document coming in. >> : So a delay in warming up a cache "could" be acceptable if I do it a few >> times >> : per day. But not on a too regular basis (right now, the first query >> that >> loads >> : the cache takes 150s). >> : >> : However: I'm not sure why it looks not to be a good idea to update the >> caches >> >> you can refresh the caches automaticly after updating, the "newSearcher" >> event is fired whenever a searcher is opened (but before it's used by >> clients) so you can configure warming queries for it -- it doesn't have >> to >> be done manually (or by the first user to use that reader) >> >> so you can send your updates anytime you want, and as long as you only >> commit every 5 minutes (or commit on a master as often as you want, but >> only run snappuller/snapinstaller on your slaves every 5 minutes) your >> results will be at most 5minutes + warming time stale. >> >> >> -Hoss >> >> > > -- View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p23094174.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting performance + replication of index between cores
Hi, After fully reloading my index, using another field than a Data does not help that much. Using a warmup query avoids having the first request slow, but: - Frequents commits means that the Searcher is reloaded frequently and, as the warmup takes time, the clients must wait. - Having warmup slows down the index process (I guess this is because after a commit, the Searchers are recreated) So I'm considering, as suggested, to have two instances: one for indexing and one for searching. I was wondering if there are simple ways to replicate the index in a single Solr server running two cores ? Any such config already tested ? I guess that the standard replication based on rsync can be simplified a lot in this case as the two indexes are on the same server. Thanks Christophe Beniamin Janicki wrote: :so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as well ( commits are done once per 5 minutes ). I've got master - slave configuration. Master has turned off all caches (commented in solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB ,Xmx= 1GB and committing takes around 10 secs ( on default configuration with warming it took from 30 mins up to 2 hours). Slave caches are configured to have autowarmCount="0" and maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is done. I haven't noticed any huge delays while serving search request. Try to use those values - may be they'll help in your case too. Ben Janicki -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 22 October 2008 04:56 To: solr-user@lucene.apache.org Subject: Re: Sorting performance : The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache "could" be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the "newSearcher" event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss
Re: Sorting performance
Hi, I'm now reloading my index. The issue might be related with the way dates are handled (I was sorting on a date field). Now, I have added an integer field that represent the date (but in minutes instead of milli seconds). With 4M documents (and indexing running in background), I have a correct response time, even for the first query. I still want to check with 10M and more documents. Once my index is fully loaded, I will try the config parameters you suggest. Thanks Christophe Beniamin Janicki wrote: :so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as well ( commits are done once per 5 minutes ). I've got master - slave configuration. Master has turned off all caches (commented in solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB ,Xmx= 1GB and committing takes around 10 secs ( on default configuration with warming it took from 30 mins up to 2 hours). Slave caches are configured to have autowarmCount="0" and maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is done. I haven't noticed any huge delays while serving search request. Try to use those values - may be they'll help in your case too. Ben Janicki -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 22 October 2008 04:56 To: solr-user@lucene.apache.org Subject: Re: Sorting performance : The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache "could" be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the "newSearcher" event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss
RE: Sorting performance
:so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as well ( commits are done once per 5 minutes ). I've got master - slave configuration. Master has turned off all caches (commented in solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB ,Xmx= 1GB and committing takes around 10 secs ( on default configuration with warming it took from 30 mins up to 2 hours). Slave caches are configured to have autowarmCount="0" and maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is done. I haven't noticed any huge delays while serving search request. Try to use those values - may be they'll help in your case too. Ben Janicki -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 22 October 2008 04:56 To: solr-user@lucene.apache.org Subject: Re: Sorting performance : The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache "could" be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the "newSearcher" event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss
Re: Sorting performance
: The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache "could" be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the "newSearcher" event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss
Re: Sorting performance
I'm now considering if Solr (Lucene) is a good choice when we have a huge number of indexed document and a large number of new documents needs to be indexed everyday. Maybe I'm wrong, but my feeling is that the way the sort caches are handled (recreated after new commit, not shared between Searcher), the solution does not scale. And it is not just a memory issue (memory is cheap), but more the lack of update of an existing cache. I'm testing if I can sort on a field that might be faster to cache: any hints on this ? Would that make a difference if I use a field with less different values than a timestamp ? I'm looking for some details on how the cache is populated on the first query. Also, for the code insiders ;-), would that be difficult to change this caching mechanism to allow update and reuse of an existing cache ? Thanks for your help Christophe christophe wrote: The problem is that I will have hundreds of users doing queries, and a continuous flow of document coming in. So a delay in warming up a cache "could" be acceptable if I do it a few times per day. But not on a too regular basis (right now, the first query that loads the cache takes 150s). However: I'm not sure why it looks not to be a good idea to update the caches when updates are committed ? Any centralized cache (memcached is a good one) that is maintained up to date by the update/commit process would be great. Config options could then let to the user to decide if the cache is shared between servers or not. Creating a new cache and then swap it will double the necessary memory. I also have a related questions regarding readers: a new reader is opened when documents are committed. And the cache is associated with the reader (if I got it right). Are all user requests served by this reader ? How does that scale if I have many concurrent users ? C. Norberto Meijome wrote: On Mon, 20 Oct 2008 16:28:23 +0300 christophe <[EMAIL PROTECTED]> wrote: Hum. this mean I have to wait before I index new documents and avoid indexing when they are created (I have about 50 000 new documents created each day and I was planning to make those searchable ASAP). you can always index + optimize out of band in a 'master' / RW server , and then send the updated index to your slave (the one actually serving the requests). This *will NOT* remove the need to refresh your cache, but it will remove any delay introduced by commit/indexing + optimise. Too bad there is no way to have a centralized cache that can be shared AND updated when new documents are created. hmm not sure it makes sense like that... but maybe along the lines of having an active cache that is used to serve queries, and new ones being prepared, and then swapped when ready. Speaking of which (or not :P) , has anyone thought about / done any work on using memcached for these internal solr caches? I guess it would make sense for setups with several slaves ( or even a master updating memcached too...)...though for a setup with shards it would be slightly more involved (although it *could* be used to support several slaves per 'data shard' ). All the best, B _ {Beto|Norberto|Numard} Meijome RTFM and STFW before anything bad happens. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
RE: Sorting performance
Accd to previous posters on this topic, sorting requires an array with an entry per document in the entire index. Each entry has 32 bits for the 'int' type, and 32 bits plus the field representation length for other types. Not knowing Lucene internals I have a hard time believing that it really has to be this wasteful, but oh well. Since 'sint' is needed to do range queries on a field, and 'int' is needed for efficient sorting, we wound up have one field of each type and a to make sure they both get the same numbers. Yes, it's annoying. -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 6:24 AM To: solr-user@lucene.apache.org Subject: Re: Sorting performance christophe wrote: > When I start indexing new documents, searches are taking long time > again: is the sort cache flushed when new documents are indexed ? When you commit, a new Reader will be opened (or reopened) so that the freshly added docs can be seen. This would make the first search slow again, but if you have the warming queries, it should be warmed before being put into use. Be sure the warming query sorts on the right field. > > Are there any metrics on how to compute memory requirements (based on > doc average size, number of sorted fields, number of indexed documents > + number of new document / day) ? Depends on the field type, but I think its 32bits x numDocs for most datatypes, with the String datatype also requiring an array of all the unique terms to index into. Thats not everything, but it dominates. > Thanks > Christophe > Mark Miller wrote: >> You need to setup a warming query that sorts so that the initial long >> query is done behind the scenes. Users first query will then be fast. >> Solrconfig. >> >> - Mark >> >> >> On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> >> wrote: >> >>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m >>> -Xmx2024m >>> With those values, the second query is way faster. Only the first >>> one is very slow. >>> Thanks for the tip. >>> However, I'm wondering if will be enough and I will not hit the same >>> issues when I will have many users searching at the same time: I >>> will do a stress test to check this. >>> >>> Thanks >>> Christophe >>> >>> christophe wrote: >>>> It is slow each time I run it. (I test it from the Solr admin >>>> console or from a JAVA program using the Http client). >>>> I do not get the OOM each time. >>>> >>>> Thx >>>> Christophe >>>> >>>> Otis Gospodnetic wrote: >>>>> Is the sorted query slow only the first time or every time you run >>>>> it? >>>>> >>>>> You got an OOM? What -Xmx value are you using? Try increasing it. >>>>> >>>>> Otis >>>>> -- >>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>>>> >>>>> >>>>> >>>>> - Original Message >>>>> >>>>>> From: christophe <[EMAIL PROTECTED]> >>>>>> To: solr-user@lucene.apache.org >>>>>> Sent: Friday, October 17, 2008 1:28:52 PM >>>>>> Subject: Sorting performance >>>>>> Hi, >>>>>> >>>>>> I'm doing some tests with Solr1.3 >>>>>> I have loaded around 7M documents, each with a few stored and >>>>>> indexed fields. >>>>>> >>>>>> This query: text:sometext returns the results, sorted by score in >>>>>> a few milliseconds. (I display 10 out of 8747 matched documents) >>>>>> This one: text:sometext;id desc takes something like 60s or >>>>>> more to return the data (when it doesn't fails with an out of >>>>>> memory error). (id is a string type). >>>>>> I have tried to display only id, same results. >>>>>> >>>>>> Any ideas ? I'm sure I'm doing something wrong. >>>>>> >>>>>> My schema is based on the sample, with the following fields: >>>>>> >>>>>> /> multiValued="true" /> >>>>>> default="NOW" multiValued="false"/> >>>>>> >>>>>> Thanks >>>>>> Christophe >>>>>> >>>>> >>>>> >>>> >>> >
Re: Sorting performance
The problem is that I will have hundreds of users doing queries, and a continuous flow of document coming in. So a delay in warming up a cache "could" be acceptable if I do it a few times per day. But not on a too regular basis (right now, the first query that loads the cache takes 150s). However: I'm not sure why it looks not to be a good idea to update the caches when updates are committed ? Any centralized cache (memcached is a good one) that is maintained up to date by the update/commit process would be great. Config options could then let to the user to decide if the cache is shared between servers or not. Creating a new cache and then swap it will double the necessary memory. I also have a related questions regarding readers: a new reader is opened when documents are committed. And the cache is associated with the reader (if I got it right). Are all user requests served by this reader ? How does that scale if I have many concurrent users ? C. Norberto Meijome wrote: On Mon, 20 Oct 2008 16:28:23 +0300 christophe <[EMAIL PROTECTED]> wrote: Hum. this mean I have to wait before I index new documents and avoid indexing when they are created (I have about 50 000 new documents created each day and I was planning to make those searchable ASAP). you can always index + optimize out of band in a 'master' / RW server , and then send the updated index to your slave (the one actually serving the requests). This *will NOT* remove the need to refresh your cache, but it will remove any delay introduced by commit/indexing + optimise. Too bad there is no way to have a centralized cache that can be shared AND updated when new documents are created. hmm not sure it makes sense like that... but maybe along the lines of having an active cache that is used to serve queries, and new ones being prepared, and then swapped when ready. Speaking of which (or not :P) , has anyone thought about / done any work on using memcached for these internal solr caches? I guess it would make sense for setups with several slaves ( or even a master updating memcached too...)...though for a setup with shards it would be slightly more involved (although it *could* be used to support several slaves per 'data shard' ). All the best, B _ {Beto|Norberto|Numard} Meijome RTFM and STFW before anything bad happens. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Sorting performance
On Mon, 20 Oct 2008 16:28:23 +0300 christophe <[EMAIL PROTECTED]> wrote: > Hum. this mean I have to wait before I index new documents and avoid > indexing when they are created (I have about 50 000 new documents > created each day and I was planning to make those searchable ASAP). you can always index + optimize out of band in a 'master' / RW server , and then send the updated index to your slave (the one actually serving the requests). This *will NOT* remove the need to refresh your cache, but it will remove any delay introduced by commit/indexing + optimise. > Too bad there is no way to have a centralized cache that can be shared > AND updated when new documents are created. hmm not sure it makes sense like that... but maybe along the lines of having an active cache that is used to serve queries, and new ones being prepared, and then swapped when ready. Speaking of which (or not :P) , has anyone thought about / done any work on using memcached for these internal solr caches? I guess it would make sense for setups with several slaves ( or even a master updating memcached too...)...though for a setup with shards it would be slightly more involved (although it *could* be used to support several slaves per 'data shard' ). All the best, B _ {Beto|Norberto|Numard} Meijome RTFM and STFW before anything bad happens. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Sorting performance
Hum. this mean I have to wait before I index new documents and avoid indexing when they are created (I have about 50 000 new documents created each day and I was planning to make those searchable ASAP). Too bad there is no way to have a centralized cache that can be shared AND updated when new documents are created. C. Mark Miller wrote: christophe wrote: When I start indexing new documents, searches are taking long time again: is the sort cache flushed when new documents are indexed ? When you commit, a new Reader will be opened (or reopened) so that the freshly added docs can be seen. This would make the first search slow again, but if you have the warming queries, it should be warmed before being put into use. Be sure the warming query sorts on the right field. Are there any metrics on how to compute memory requirements (based on doc average size, number of sorted fields, number of indexed documents + number of new document / day) ? Depends on the field type, but I think its 32bits x numDocs for most datatypes, with the String datatype also requiring an array of all the unique terms to index into. Thats not everything, but it dominates. Thanks Christophe Mark Miller wrote: You need to setup a warming query that sorts so that the initial long query is done behind the scenes. Users first query will then be fast. Solrconfig. - Mark On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> wrote: Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
christophe wrote: When I start indexing new documents, searches are taking long time again: is the sort cache flushed when new documents are indexed ? When you commit, a new Reader will be opened (or reopened) so that the freshly added docs can be seen. This would make the first search slow again, but if you have the warming queries, it should be warmed before being put into use. Be sure the warming query sorts on the right field. Are there any metrics on how to compute memory requirements (based on doc average size, number of sorted fields, number of indexed documents + number of new document / day) ? Depends on the field type, but I think its 32bits x numDocs for most datatypes, with the String datatype also requiring an array of all the unique terms to index into. Thats not everything, but it dominates. Thanks Christophe Mark Miller wrote: You need to setup a warming query that sorts so that the initial long query is done behind the scenes. Users first query will then be fast. Solrconfig. - Mark On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> wrote: Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
Caches are specific to opening a searcher. So whenever you open a reader, the caches are rebuilt for that server. If you are picking up your changes, you MUST be opening a new reader so yes, indeed, your caches are being flushed. You can get around this by firing a few warmup queries at the server before using it "for real". If you are opening a new reader for each request, well, you shouldn't do that . Best Erick On Mon, Oct 20, 2008 at 9:02 AM, christophe <[EMAIL PROTECTED]>wrote: > When I start indexing new documents, searches are taking long time again: > is the sort cache flushed when new documents are indexed ? > > Thanks > Christophe > > Mark Miller wrote: > >> You need to setup a warming query that sorts so that the initial long >> query is done behind the scenes. Users first query will then be fast. >> Solrconfig. >> >> - Mark >> >> >> On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> >> wrote: >> >> Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m >>> With those values, the second query is way faster. Only the first one is >>> very slow. >>> Thanks for the tip. >>> However, I'm wondering if will be enough and I will not hit the same >>> issues when I will have many users searching at the same time: I will do a >>> stress test to check this. >>> >>> Thanks >>> Christophe >>> >>> christophe wrote: >>> >>>> It is slow each time I run it. (I test it from the Solr admin console or >>>> from a JAVA program using the Http client). >>>> I do not get the OOM each time. >>>> >>>> Thx >>>> Christophe >>>> >>>> Otis Gospodnetic wrote: >>>> >>>>> Is the sorted query slow only the first time or every time you run it? >>>>> >>>>> You got an OOM? What -Xmx value are you using? Try increasing it. >>>>> >>>>> Otis >>>>> -- >>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>>>> >>>>> >>>>> >>>>> - Original Message >>>>> >>>>> From: christophe <[EMAIL PROTECTED]> >>>>>> To: solr-user@lucene.apache.org >>>>>> Sent: Friday, October 17, 2008 1:28:52 PM >>>>>> Subject: Sorting performance >>>>>> Hi, >>>>>> >>>>>> I'm doing some tests with Solr1.3 >>>>>> I have loaded around 7M documents, each with a few stored and indexed >>>>>> fields. >>>>>> >>>>>> This query: text:sometext returns the results, sorted by score in a >>>>>> few milliseconds. (I display 10 out of 8747 matched documents) >>>>>> This one: text:sometext;id desc takes something like 60s or more to >>>>>> return the data (when it doesn't fails with an out of memory error). (id >>>>>> is >>>>>> a string type). >>>>>> I have tried to display only id, same results. >>>>>> >>>>>> Any ideas ? I'm sure I'm doing something wrong. >>>>>> >>>>>> My schema is based on the sample, with the following fields: >>>>>> >>>>>> /> multiValued="true" /> >>>>>> default="NOW" multiValued="false"/> >>>>>> >>>>>> Thanks >>>>>> Christophe >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >
Re: Sorting performance
When I start indexing new documents, searches are taking long time again: is the sort cache flushed when new documents are indexed ? Thanks Christophe Mark Miller wrote: You need to setup a warming query that sorts so that the initial long query is done behind the scenes. Users first query will then be fast. Solrconfig. - Mark On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> wrote: Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
Will do so. Thanks. Are there any metrics on how to compute memory requirements (based on doc average size, number of sorted fields, number of indexed documents + number of new document / day) ? Thanks Christophe Mark Miller wrote: You need to setup a warming query that sorts so that the initial long query is done behind the scenes. Users first query will then be fast. Solrconfig. - Mark On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> wrote: Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
You need to setup a warming query that sorts so that the initial long query is done behind the scenes. Users first query will then be fast. Solrconfig. - Mark On Oct 18, 2008, at 1:34 AM, christophe <[EMAIL PROTECTED]> wrote: Here are the memory parameters I'm using now(Tomcat): -Xms2024m - Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m With those values, the second query is way faster. Only the first one is very slow. Thanks for the tip. However, I'm wondering if will be enough and I will not hit the same issues when I will have many users searching at the same time: I will do a stress test to check this. Thanks Christophe christophe wrote: It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
It is slow each time I run it. (I test it from the Solr admin console or from a JAVA program using the Http client). I do not get the OOM each time. Thx Christophe Otis Gospodnetic wrote: Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: christophe <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 17, 2008 1:28:52 PM Subject: Sorting performance Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: /> multiValued="true" /> default="NOW" multiValued="false"/> Thanks Christophe
Re: Sorting performance
Is the sorted query slow only the first time or every time you run it? You got an OOM? What -Xmx value are you using? Try increasing it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: christophe <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, October 17, 2008 1:28:52 PM > Subject: Sorting performance > > Hi, > > I'm doing some tests with Solr1.3 > I have loaded around 7M documents, each with a few stored and indexed > fields. > > This query: text:sometext returns the results, sorted by score in a few > milliseconds. (I display 10 out of 8747 matched documents) > This one: text:sometext;id desc takes something like 60s or more to > return the data (when it doesn't fails with an out of memory error). (id > is a string type). > I have tried to display only id, same results. > > Any ideas ? I'm sure I'm doing something wrong. > > My schema is based on the sample, with the following fields: > > > /> > > > > > > multiValued="true" /> > > default="NOW" multiValued="false"/> > > > > Thanks > Christophe
Sorting performance
Hi, I'm doing some tests with Solr1.3 I have loaded around 7M documents, each with a few stored and indexed fields. This query: text:sometext returns the results, sorted by score in a few milliseconds. (I display 10 out of 8747 matched documents) This one: text:sometext;id desc takes something like 60s or more to return the data (when it doesn't fails with an out of memory error). (id is a string type). I have tried to display only id, same results. Any ideas ? I'm sure I'm doing something wrong. My schema is based on the sample, with the following fields: Thanks Christophe