Re: timeAllowed in not honoring
Apologies for late reply, Thanks Toke for a great explaination :) I am new in solr so i am unaware of DocValues, so please can you explain. With Regards Aman Tandon On Fri, May 2, 2014 at 1:52 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Thu, 2014-05-01 at 23:03 +0200, Aman Tandon wrote: So can you explain how enum is faster than default. The fundamental difference is than enum iterates terms and counts how many of the documents associated to the terms are in the hits, while fc iterates all hits and updates a counter for the term associated to the document. A bit too simplified we have enum: terms-docs, fc: hits-terms. enum wins when there are relatively few unique terms and is much less affected by index updates than fc. As Shawn says, you are best off by testing. We are planning to move to SolrCloud with the version solr 4.7.1, so does this 14 GB of RAM will be sufficient? or should we increase it? Switching to SolrCloud does not change your fundamental memory requirements for searching. The merging part adds some overhead, but with a heap of 14GB, I would be surprised if that would require an increase. Consider using DocValues for facet fields with many unique values, for getting both speed and low memory usage at the cost of increased index size. - Toke Eskildsen, State and University Library, Denmark
Re: timeAllowed in not honoring
On Thu, 2014-05-01 at 23:38 +0200, Shawn Heisey wrote: I was surprised to read that fc uses less memory. I think that is an error in the documentation. Except for special cases, such as asking for all facet values on a high cardinality field, I would estimate that enum uses less memory than fc. - Toke Eskildsen, State and University Library, Denmark
Re: timeAllowed in not honoring
On Thu, 2014-05-01 at 23:03 +0200, Aman Tandon wrote: So can you explain how enum is faster than default. The fundamental difference is than enum iterates terms and counts how many of the documents associated to the terms are in the hits, while fc iterates all hits and updates a counter for the term associated to the document. A bit too simplified we have enum: terms-docs, fc: hits-terms. enum wins when there are relatively few unique terms and is much less affected by index updates than fc. As Shawn says, you are best off by testing. We are planning to move to SolrCloud with the version solr 4.7.1, so does this 14 GB of RAM will be sufficient? or should we increase it? Switching to SolrCloud does not change your fundamental memory requirements for searching. The merging part adds some overhead, but with a heap of 14GB, I would be surprised if that would require an increase. Consider using DocValues for facet fields with many unique values, for getting both speed and low memory usage at the cost of increased index size. - Toke Eskildsen, State and University Library, Denmark
Re: timeAllowed in not honoring
On 4/30/2014 5:53 PM, Aman Tandon wrote: Shawn - Yes we have some plans to move to SolrCloud, Our total index size is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr is 14GB, the GC tuning parameters using in our server is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. This means that you have about 18GB of RAM left over to cache a 40GB index. That's less than 50 percent. Every index is different, but this is in the ballpark of where performance problems begin. If you had 48GB of RAM, your performance (not counting possible GC problems) would likely be very good. 64GB would be ideal. Your only GC tuning is switching the collector to CMS. This won't be enough. When I had a config like this and heap of only 8GB, I was seeing GC pauses of 10 to 12 seconds. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning One question: Do you really need 14GB of heap? One of my servers has a total of 65GB of index (54 million docs) with a 7GB heap and 64GB of RAM. Currently I don't use facets, though. When I do, they will be enum. If you switch all your facets to enum, your heap requirements may go down. Decreasing the heap size will make more memory available for index caching. Thanks, Shawn
Re: timeAllowed in not honoring
Hi Shawn, Please check that link http://wiki.apache.org/solr/SimpleFacetParameters#facet.method there is something mentioned in facet.method wiki *The default value is fc (except for BoolField which uses enum) since it tends to use less memory and is faster then the enumeration method when a field has many unique terms in the index.* So can you explain how enum is faster than default. Also we are currently using the solr 4.2 does that support this facet.method=enum, if not then which version should we pick. We are planning to move to SolrCloud with the version solr 4.7.1, so does this 14 GB of RAM will be sufficient? or should we increase it? With Regards Aman Tandon On Thu, May 1, 2014 at 8:20 PM, Shawn Heisey s...@elyograg.org wrote: On 4/30/2014 5:53 PM, Aman Tandon wrote: Shawn - Yes we have some plans to move to SolrCloud, Our total index size is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr is 14GB, the GC tuning parameters using in our server is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. This means that you have about 18GB of RAM left over to cache a 40GB index. That's less than 50 percent. Every index is different, but this is in the ballpark of where performance problems begin. If you had 48GB of RAM, your performance (not counting possible GC problems) would likely be very good. 64GB would be ideal. Your only GC tuning is switching the collector to CMS. This won't be enough. When I had a config like this and heap of only 8GB, I was seeing GC pauses of 10 to 12 seconds. http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning One question: Do you really need 14GB of heap? One of my servers has a total of 65GB of index (54 million docs) with a 7GB heap and 64GB of RAM. Currently I don't use facets, though. When I do, they will be enum. If you switch all your facets to enum, your heap requirements may go down. Decreasing the heap size will make more memory available for index caching. Thanks, Shawn
Re: timeAllowed in not honoring
On 5/1/2014 3:03 PM, Aman Tandon wrote: Please check that link http://wiki.apache.org/solr/SimpleFacetParameters#facet.method there is something mentioned in facet.method wiki *The default value is fc (except for BoolField which uses enum) since it tends to use less memory and is faster then the enumeration method when a field has many unique terms in the index.* So can you explain how enum is faster than default. Also we are currently using the solr 4.2 does that support this facet.method=enum, if not then which version should we pick. We are planning to move to SolrCloud with the version solr 4.7.1, so does this 14 GB of RAM will be sufficient? or should we increase it? The fc method (which means fieldcache) puts all the data required to build facets on that field into the fieldcache, and that data stays there until the next commit or restart. If you are committing frequently, that memory use might be wasted. I was surprised to read that fc uses less memory. It may be very true that the amount of memory required for a single call with facet.method=enum is more than the amount of memory required in the fieldcache for facet.method=fc, but that memory can be recovered as garbage -- with the fc method, it can't be recovered. It sits there, waiting for that facet to be used again, so it can speed it up. When you commit and open a new searcher, it gets thrown away. If you use a lot of different facets, the fieldcache can become HUGE with the fc method. *If you don't do all those facets at the same time* (a very important qualifier), you can switch to enum and the total amount of resident heap memory required will be a lot less. There may be a lot of garbage to collect, but the total heap requirement at any given moment should be smaller. If you actually need to build a lot of different facets at nearly the same time, enum may not actually help. The enum method is actually a little slower than fc for a single run, but the java heap characteristics for multiple runs can cause enum to be faster in bulk. Try both and see what your results are. Thanks, Shawn
Re: timeAllowed in not honoring
I had this issue too. timeAllowed only works for a certain phase of the query. I think that's the 'process' part. However, if the query is taking time in 'prepare' phase (e.g. I think for wildcards to get all the possible combinations before running the query) it won't have any impact on that. You can debug your query and confirm that. On Wed, Apr 30, 2014 at 10:43 AM, Aman Tandon amantandon...@gmail.comwrote: Shawn this is the first time i raised this problem. My heap size is 14GB and i am not using solr cloud currently, 40GB index is replicated from master to two slaves. I read somewhere that it return the partial results which is computed by the query in that specified amount of time which is defined by this timeAllowed parameter, but it doesn't seems to happen. Here is the link : http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed *The time allowed for a search to finish. This value only applies to the search and not to requests in general. Time is in milliseconds. Values = 0 mean no time restriction. Partial results may be returned (if there are any). * With Regards Aman Tandon On Wed, Apr 30, 2014 at 10:05 AM, Shawn Heisey s...@elyograg.org wrote: On 4/29/2014 10:05 PM, Aman Tandon wrote: I am using solr 4.2 with the index size of 40GB, while querying to my index there are some queries which is taking the significant amount of time of about 22 seconds *in the case of minmatch of 50%*. So i added a parameter timeAllowed = 2000 in my query but this doesn't seems to be work. Please help me out. I remember reading that timeAllowed has some limitations about which stages of a query it can limit, particularly in the distributed case. These limitations mean that it cannot always limit the total time for a query. I do not remember precisely what those limitations are, and I cannot find whatever it was that I was reading. When I looked through my local list archive to see if you had ever mentioned how much RAM you have and what the size of your Solr heap is, there didn't seem to be anything. There's not enough information for me to know whether that 40GB is the amount of index data on a single SolrCloud server, or whether it's the total size of the index across all servers. If we leave timeAllowed alone for a moment and treat this purely as a performance problem, usually my questions revolve around figuring out whether you have enough RAM. Here's where that conversation ends up: http://wiki.apache.org/solr/SolrPerformanceProblems I think I've probably mentioned this to you before on another thread. Thanks, Shawn -- Regards, Salman Akram
Re: timeAllowed in not honoring
Hi Salman, here is the my debug query dump please help!. I am unable to find the wildcards in it. ?xml version=1.0 encoding=UTF-8?responselst name=responseHeader bool name=partialResultstrue/bool int name=status0/int int name=QTime10080/int/lstresult name=response numFound=976303 start=0/resultlst name=facet_counts lst name=facet_queries/ lst name=facet_fieldslst name=city int name=delhi ncr884159/int int name=delhi629472/int int name=mumbai491426/int int name=ahmedabad259356/int int name=chennai259029/int int name=bengaluru257193/int int name=kolkata195077/int int name=pune193569/int int name=hyderabad179369/int int name=jaipur115356/int int name=coimbatore111644/int int name=noida86794/int int name=surat80621/int int name=gurgaon72815/int int name=rajkot68982/int int name=vadodara65082/int int name=ludhiana63244/int int name=thane55091/int int name=indore50225/int int name=ghaziabad49756/int int name=faridabad45322/int int name=navi mumbai40127/int int name=tiruppur37639/int int name=nagpur37126/int int name=kochi32874/int/lst lst name=datatype int name=product966816/int int name=offer6003/int int name=company3484/int /lst /lst lst name=facet_dates/ lst name=facet_ranges//lstlst name=debug str name=rawquerystringmisc items/str str name=querystringmisc items/str str name=parsedqueryBoostedQuery(boost(+(((titlex:misc^1.5 | smalldesc:misc | titlews:misc^0.5 | city:misc | usrpcatname:misc | mcatnametext:misc^0.2)~0.3 (titlex:item^1.5 | smalldesc:item | titlews:items^0.5 | city:items | usrpcatname:item | mcatnametext:item^0.2)~0.3)~1) (mcatnametext:misc item^0.5)~0.3 (titlews:misc items)~0.3 (titlex:misc item^3.0)~0.3 (smalldesc:misc item^2.0)~0.3 (usrpcatname:misc item)~0.3 (),product(map(query(+(titlex:item imsw)~0.3 (),def=0.0),0.0,0.0,1.0),map(query(+(titlex:misc item imsw)~0.3 (),def=0.0),0.0,0.0,1.0),map(int(sdesclen),0.0,150.0,1.0),map(int(sdesclen),0.0,0.0,0.1),map(int(CustTypeWt),699.0,699.0,1.2),map(int(CustTypeWt),199.0,199.0,1.3),map(int(CustTypeWt),0.0,179.0,1.35),1.0/(3.16E-11*float(ms(const(1398852652419),date(lastactiondatet)))+1.0),map(ms(const(1398852652419),date(blpurchasedate)),0.0,2.6E9,1.15),map(query(+(attribs:hot)~0.3 (titlex:hot^3.0 | smalldesc:hot^2.0 | titlews:hot | city:hot | usrpcatname:hot | mcatnametext:hot^0.5)~0.3,def=0.0),0.0,0.0,1.0),map(query(+(attribs:dupimg)~0.3 (titlex:dupimg^3.0 | smalldesc:dupimg^2.0 | titlews:dupimg | city:dupimg | usrpcatname:dupimg | mcatnametext:dupimg^0.5)~0.3,def=0.0),0.0,0.0,1.0),map(query(+(isphoto:T)~0.3 (),def=0.0),0.0,0.0,0.1/str str name=parsedquery_toStringboost(+(((titlex:misc^1.5 | smalldesc:misc | titlews:misc^0.5 | city:misc | usrpcatname:misc | mcatnametext:misc^0.2)~0.3 (titlex:item^1.5 | smalldesc:item | titlews:items^0.5 | city:items | usrpcatname:item | mcatnametext:item^0.2)~0.3)~1) (mcatnametext:misc item^0.5)~0.3 (titlews:misc items)~0.3 (titlex:misc item^3.0)~0.3 (smalldesc:misc item^2.0)~0.3 (usrpcatname:misc item)~0.3 (),product(map(query(+(titlex:item imsw)~0.3 (),def=0.0),0.0,0.0,1.0),map(query(+(titlex:misc item imsw)~0.3 (),def=0.0),0.0,0.0,1.0),map(int(sdesclen),0.0,150.0,1.0),map(int(sdesclen),0.0,0.0,0.1),map(int(CustTypeWt),699.0,699.0,1.2),map(int(CustTypeWt),199.0,199.0,1.3),map(int(CustTypeWt),0.0,179.0,1.35),1.0/(3.16E-11*float(ms(const(1398852652419),date(lastactiondatet)))+1.0),map(ms(const(1398852652419),date(blpurchasedate)),0.0,2.6E9,1.15),map(query(+(attribs:hot)~0.3 (titlex:hot^3.0 | smalldesc:hot^2.0 | titlews:hot | city:hot | usrpcatname:hot | mcatnametext:hot^0.5)~0.3,def=0.0),0.0,0.0,1.0),map(query(+(attribs:dupimg)~0.3 (titlex:dupimg^3.0 | smalldesc:dupimg^2.0 | titlews:dupimg | city:dupimg | usrpcatname:dupimg | mcatnametext:dupimg^0.5)~0.3,def=0.0),0.0,0.0,1.0),map(query(+(isphoto:T)~0.3 (),def=0.0),0.0,0.0,0.1)))/str lst name=explain/ str name=QParserSynonymExpandingExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ arr name=filter_queries str{!tag=cityf}latlong:Intersects(Circle(28.63576,77.22445 d=2.248))/strstrattribs:(locprefglobal locprefnational locprefcity)/strstr+((+datatype:product +attribs:(aprstatus20 aprstatus40 aprstatus50) +aggregate:true -attribs:liststatusnfl +((+countryiso:IN +isfcp:true) CustTypeWt:[149 TO 1499])) (+datatype:offer +iildisplayflag:true) (+datatype:company -attribs:liststatusnfl +((+countryiso:IN +isfcp:true) CustTypeWt:[149 TO 1499]))) -attribs:liststatusdnf/str /arr arr name=parsed_filter_queries strConstantScore(org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter@414cd6c2)/str strattribs:locprefglobal attribs:locprefnational attribs:locprefcity/strstr+((+datatype:product +(attribs:aprstatus20 attribs:aprstatus40 attribs:aprstatus50) +aggregate:true -attribs:liststatusnfl +((+countryiso:IN +isfcp:true)
Re: timeAllowed in not honoring
On Wed, Apr 30, 2014 at 2:16 PM, Aman Tandon amantandon...@gmail.comwrote: lst name=querydouble name=time3337.0/double /lst lst name=facet double name=time6739.0/double /lst Most time is spent in facet counting. FacetComponent doesn't checks timeAllowed right now. You can try to experiment with facet.method=enum or even with https://issues.apache.org/jira/browse/SOLR-5725 or try to distribute search with SolrCloud. AFAIK, you can't employ threads to speed up multivalue facets. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: timeAllowed in not honoring
On 4/29/2014 11:43 PM, Aman Tandon wrote: My heap size is 14GB and i am not using solr cloud currently, 40GB index is replicated from master to two slaves. I read somewhere that it return the partial results which is computed by the query in that specified amount of time which is defined by this timeAllowed parameter, but it doesn't seems to happen. Mikhail Khludnev has replied and explained why timeAllowed isn't stopping the query and returning partial results. A 14GB heap is quite large. If you aren't starting Solr with garbage collection tuning parameters, long GC pauses *will* be happening, and that will make some of your queries take a really long time. The wiki page I sent has a section about garbage collection and a link showing the GC tuning parameters that I use. You didn't indicate how much total RAM you have. If your total RAM is 16GB, that's definitely not enough for a 14GB heap and a 40GB index. 32GB of total RAM might be enough, but it also might not be. A perfect world RAM size for this setup would be at least 54GB -- the total of heap plus index size, not counting the small number of megabytes that the OS and its basic services take. Thanks, Shawn
Re: timeAllowed in not honoring
It¹s not just FacetComponent, here¹s the original feature ticket for timeAllowed: https://issues.apache.org/jira/browse/SOLR-502 As I read it, timeAllowed only limits the time spent actually getting documents, not the time spent figuring out what data to get or how. I think that means the primary use-case is serving as a guard against excessive paging. On 4/30/14, 4:49 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Wed, Apr 30, 2014 at 2:16 PM, Aman Tandon amantandon...@gmail.comwrote: lst name=querydouble name=time3337.0/double /lst lst name=facet double name=time6739.0/double /lst Most time is spent in facet counting. FacetComponent doesn't checks timeAllowed right now. You can try to experiment with facet.method=enum or even with https://issues.apache.org/jira/browse/SOLR-5725 or try to distribute search with SolrCloud. AFAIK, you can't employ threads to speed up multivalue facets. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: timeAllowed in not honoring
Jeff - Thanks Jeff this discussion on jira is really quite helpful. Thanks for this. Shawn - Yes we have some plans to move to SolrCloud, Our total index size is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr is 14GB, the GC tuning parameters using in our server is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. Mikhail Khludnev - Thanks i will try to use facet.method=enum this will definitely help us in improving some time. With Regards Aman Tandon On Wed, Apr 30, 2014 at 8:30 PM, Jeff Wartes jwar...@whitepages.com wrote: It¹s not just FacetComponent, here¹s the original feature ticket for timeAllowed: https://issues.apache.org/jira/browse/SOLR-502 As I read it, timeAllowed only limits the time spent actually getting documents, not the time spent figuring out what data to get or how. I think that means the primary use-case is serving as a guard against excessive paging. On 4/30/14, 4:49 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Wed, Apr 30, 2014 at 2:16 PM, Aman Tandon amantandon...@gmail.comwrote: lst name=querydouble name=time3337.0/double /lst lst name=facet double name=time6739.0/double /lst Most time is spent in facet counting. FacetComponent doesn't checks timeAllowed right now. You can try to experiment with facet.method=enum or even with https://issues.apache.org/jira/browse/SOLR-5725 or try to distribute search with SolrCloud. AFAIK, you can't employ threads to speed up multivalue facets. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
timeAllowed in not honoring
Hi, I am using solr 4.2 with the index size of 40GB, while querying to my index there are some queries which is taking the significant amount of time of about 22 seconds *in the case of minmatch of 50%*. So i added a parameter timeAllowed = 2000 in my query but this doesn't seems to be work. Please help me out. With Regards Aman Tandon
Re: timeAllowed in not honoring
On 4/29/2014 10:05 PM, Aman Tandon wrote: I am using solr 4.2 with the index size of 40GB, while querying to my index there are some queries which is taking the significant amount of time of about 22 seconds *in the case of minmatch of 50%*. So i added a parameter timeAllowed = 2000 in my query but this doesn't seems to be work. Please help me out. I remember reading that timeAllowed has some limitations about which stages of a query it can limit, particularly in the distributed case. These limitations mean that it cannot always limit the total time for a query. I do not remember precisely what those limitations are, and I cannot find whatever it was that I was reading. When I looked through my local list archive to see if you had ever mentioned how much RAM you have and what the size of your Solr heap is, there didn't seem to be anything. There's not enough information for me to know whether that 40GB is the amount of index data on a single SolrCloud server, or whether it's the total size of the index across all servers. If we leave timeAllowed alone for a moment and treat this purely as a performance problem, usually my questions revolve around figuring out whether you have enough RAM. Here's where that conversation ends up: http://wiki.apache.org/solr/SolrPerformanceProblems I think I've probably mentioned this to you before on another thread. Thanks, Shawn
Re: timeAllowed in not honoring
Shawn this is the first time i raised this problem. My heap size is 14GB and i am not using solr cloud currently, 40GB index is replicated from master to two slaves. I read somewhere that it return the partial results which is computed by the query in that specified amount of time which is defined by this timeAllowed parameter, but it doesn't seems to happen. Here is the link : http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed *The time allowed for a search to finish. This value only applies to the search and not to requests in general. Time is in milliseconds. Values = 0 mean no time restriction. Partial results may be returned (if there are any). * With Regards Aman Tandon On Wed, Apr 30, 2014 at 10:05 AM, Shawn Heisey s...@elyograg.org wrote: On 4/29/2014 10:05 PM, Aman Tandon wrote: I am using solr 4.2 with the index size of 40GB, while querying to my index there are some queries which is taking the significant amount of time of about 22 seconds *in the case of minmatch of 50%*. So i added a parameter timeAllowed = 2000 in my query but this doesn't seems to be work. Please help me out. I remember reading that timeAllowed has some limitations about which stages of a query it can limit, particularly in the distributed case. These limitations mean that it cannot always limit the total time for a query. I do not remember precisely what those limitations are, and I cannot find whatever it was that I was reading. When I looked through my local list archive to see if you had ever mentioned how much RAM you have and what the size of your Solr heap is, there didn't seem to be anything. There's not enough information for me to know whether that 40GB is the amount of index data on a single SolrCloud server, or whether it's the total size of the index across all servers. If we leave timeAllowed alone for a moment and treat this purely as a performance problem, usually my questions revolve around figuring out whether you have enough RAM. Here's where that conversation ends up: http://wiki.apache.org/solr/SolrPerformanceProblems I think I've probably mentioned this to you before on another thread. Thanks, Shawn