Re: Function Query Optimization
Should SubQuery be faster than FunctionQuery? On Sat, Dec 12, 2020 at 10:24 AM Vincenzo D'Amore wrote: > Hi, looking at this sample it seems you have just one document for '12345', > one for '23456' and so on so forth. If this is true, why don't just try > with a subquery > > https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-_subquery_ > > On Fri, Dec 11, 2020 at 3:31 PM Jae Joo wrote: > > > I have the requirement to create field - xyz to be returned based on the > > matched result. > > Here Is the code . > > > > XYZ:concat( > > > > if(exists(query({!v='field1:12345'})), '12345', ''), > > > > if(exists(query({!v='field1:23456'})), '23456', ''), > > > > if(exists(query({!v='field1:34567'})), '34567', ''), > > > > if(exists(query({!v='field:45678'})), '45678','') > > ), > > > > I am feeling this is very complex, so I am looking for some smart and > > faster ideas. > > > > Thanks, > > > > Jae > > > > > -- > Vincenzo D'Amore >
Re: Function Query Optimization
Hi, looking at this sample it seems you have just one document for '12345', one for '23456' and so on so forth. If this is true, why don't just try with a subquery https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-_subquery_ On Fri, Dec 11, 2020 at 3:31 PM Jae Joo wrote: > I have the requirement to create field - xyz to be returned based on the > matched result. > Here Is the code . > > XYZ:concat( > > if(exists(query({!v='field1:12345'})), '12345', ''), > > if(exists(query({!v='field1:23456'})), '23456', ''), > > if(exists(query({!v='field1:34567'})), '34567', ''), > > if(exists(query({!v='field:45678'})), '45678','') > ), > > I am feeling this is very complex, so I am looking for some smart and > faster ideas. > > Thanks, > > Jae > -- Vincenzo D'Amore
Function Query Optimization
I have the requirement to create field - xyz to be returned based on the matched result. Here Is the code . XYZ:concat( if(exists(query({!v='field1:12345'})), '12345', ''), if(exists(query({!v='field1:23456'})), '23456', ''), if(exists(query({!v='field1:34567'})), '34567', ''), if(exists(query({!v='field:45678'})), '45678','') ), I am feeling this is very complex, so I am looking for some smart and faster ideas. Thanks, Jae
Re: query optimization
https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-ThedebugParameter On Wed, Jul 3, 2019 at 10:10 AM Midas A wrote: > Please suggest here > > On Wed, Jul 3, 2019 at 10:23 AM Midas A wrote: > > > Hi, > > > > How can i optimize following query it is taking time > > > > webapp=/solr path=/search params={ > > > df=ttl=0=true=1=true=true=0=0=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it=false=ttl,kw_skl,kw_it,contents==1=ttl^0.1+currdesig^0.1+predesig^0.1=0=/resumesearch="mbbss"+OR+"medicine"=2=true=mbbs,+"medical+officer",+doctor,+physician+("medical+officer")+"medical+officer"+"physician""+""general+physician""+""physicians""+""consultant+physician""+""house+physician"+"physician"+"doctor"+"mbbs"+"general+physician"+"physicians"+"consultant+physician"+"house+physician"=(293)=false==none=id,upt=1=OR=NOT+contents:("liaise+with+medical+officer"+"worked+with+medical+officer"+"working+with+medical+officer"+"reported+to+medical+officer"+"references+are+medical+officer"+"coordinated+with+medical+officer"+"closely+with+medical+officer"+"signature+of+medical+officer"+"seal+of++medical+officer"+"liaise+with+physician"+"worked+with+physician"+"working+with+physician"+"reported+to+physician"+"references+are+physician"+"coordinated+with+physician"+"closely+with+physician"+"signature+of+physician"+"seal+of++physician"+"liaise+with+doctor"+"worked+with+doctor"+"working+with+doctor"+"reported+to+doctor"+"references+are+doctor"+"coordinated+with+doctor"+"closely+with+doctor"+"signature+of+doctor"+"seal+of++doctor")=NOT+hemp:("xmwxagency"+"xmwxlimited"+"xmwxplacement"+"xmwxplus"+"xmwxprivate"+"xmwxsecurity"+"xmwxz2"+"xmwxand"+"xswxz2+plus+placement+and+security+agency+private+limited"+"xswxz2+plus+placement+and+security+agency+private"+"xswxz2+plus+placement+and+security+agency"+"xswxz2+plus+placement+and+security"+"xswxz2+plus+placement+and"+"xswxz2+plus+placement"+"xswxz2+plus"+"xswxz2")=ctc:[100.0+TO+107.2]+OR+ctc:[-1.0+TO+-1.0]=(dlh:(22))=ind:(24++42++24++8)=(rol:(292+293+294+322))=(cat:(9))=cat:(1000+OR+907+OR+1+OR+2+OR+3+OR+786+OR+4+OR+5+OR+6+OR+7+OR+8+OR+9+OR+10+OR+11+OR+12+OR+13+OR+14+OR+785+OR+15+OR+16+OR+17+OR+18+OR+908+OR+19+OR+20+OR+21+OR+23+OR+24)=NOT+is_udis:2=is_resume:0^-1000=upt_date:[*+TO+NOW/DAY-36MONTHS]^2=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15=upt_date:[NOW/DAY-3MONTHS+TO+*]^20=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=dlh:(22)^8={!boost+b%3D4}+_query_:{!edismax+qf%3D"currdesig^8+predesig^6+ttl^3+kw_skl^2+contents"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DAND+bq%3D}=_query_:{!edismax+qf%3D"currdesig+predesig+ttl+kw_skl+contents^0.01"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DOR+bq%3D}=NOT+country:isoin^-10=exp:[+10+TO+11+]=exp:[+11+TO+13+]=exp:[+13+TO+15+]=exp:[+15+TO+17+]=exp:[+17+TO+20+]=exp:[+20+TO+25+]=exp:[+25+TO+109+]=ctc:[+100+TO+101+]=ctc:[+101+TO+101.5+]=ctc:[+101.5+TO+102+]=ctc:[+102+TO+103+]=ctc:[+103+TO+104+]=ctc:[+104+TO+105+]=ctc:[+105+TO+107.5+]=ctc:[+107.5+TO+110+]=ctc:[+110+TO+115+]=ctc:[+115+TO+10100+]=1=(22)=javabin=(293)=(294)=(322)=ind=cat=rol=cl=pref=false=1=0=40=((mbbs+OR+_query_:"{!edismax+qf%3Ddlh+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany3+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+((("medical+officer")+OR+"medical+officer"~0)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany0+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("doctor"+OR+doctor)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany2+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("physician"+OR+"physicians"+OR+"general+physician"+OR+"house+physician"+OR+"consultant+physician"+OR+physician)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany1+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+_query_:"{!edismax+qf%3D\$semanticfieldskl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsskl+q.op%3DOR+bq%3D\$bq1+bf%3D}"+OR+_query_:"{!edismax+qf%3D\$semanticfieldttl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsttl+q.op%3DAND+bq%3D\$bq1+bf%3D}")=10=id=kw_skl^0.05+kw_it^0.05+ttl^0.05+currdesig^0.05+predesig^0.05=1=id=id=true} > > hits=20268 status=0 QTime=10659 > > > -- Sincerely yours Mikhail Khludnev
Re: query optimization
Please suggest here On Wed, Jul 3, 2019 at 10:23 AM Midas A wrote: > Hi, > > How can i optimize following query it is taking time > > webapp=/solr path=/search params={ > df=ttl=0=true=1=true=true=0=0=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it=false=ttl,kw_skl,kw_it,contents==1=ttl^0.1+currdesig^0.1+predesig^0.1=0=/resumesearch="mbbss"+OR+"medicine"=2=true=mbbs,+"medical+officer",+doctor,+physician+("medical+officer")+"medical+officer"+"physician""+""general+physician""+""physicians""+""consultant+physician""+""house+physician"+"physician"+"doctor"+"mbbs"+"general+physician"+"physicians"+"consultant+physician"+"house+physician"=(293)=false==none=id,upt=1=OR=NOT+contents:("liaise+with+medical+officer"+"worked+with+medical+officer"+"working+with+medical+officer"+"reported+to+medical+officer"+"references+are+medical+officer"+"coordinated+with+medical+officer"+"closely+with+medical+officer"+"signature+of+medical+officer"+"seal+of++medical+officer"+"liaise+with+physician"+"worked+with+physician"+"working+with+physician"+"reported+to+physician"+"references+are+physician"+"coordinated+with+physician"+"closely+with+physician"+"signature+of+physician"+"seal+of++physician"+"liaise+with+doctor"+"worked+with+doctor"+"working+with+doctor"+"reported+to+doctor"+"references+are+doctor"+"coordinated+with+doctor"+"closely+with+doctor"+"signature+of+doctor"+"seal+of++doctor")=NOT+hemp:("xmwxagency"+"xmwxlimited"+"xmwxplacement"+"xmwxplus"+"xmwxprivate"+"xmwxsecurity"+"xmwxz2"+"xmwxand"+"xswxz2+plus+placement+and+security+agency+private+limited"+"xswxz2+plus+placement+and+security+agency+private"+"xswxz2+plus+placement+and+security+agency"+"xswxz2+plus+placement+and+security"+"xswxz2+plus+placement+and"+"xswxz2+plus+placement"+"xswxz2+plus"+"xswxz2")=ctc:[100.0+TO+107.2]+OR+ctc:[-1.0+TO+-1.0]=(dlh:(22))=ind:(24++42++24++8)=(rol:(292+293+294+322))=(cat:(9))=cat:(1000+OR+907+OR+1+OR+2+OR+3+OR+786+OR+4+OR+5+OR+6+OR+7+OR+8+OR+9+OR+10+OR+11+OR+12+OR+13+OR+14+OR+785+OR+15+OR+16+OR+17+OR+18+OR+908+OR+19+OR+20+OR+21+OR+23+OR+24)=NOT+is_udis:2=is_resume:0^-1000=upt_date:[*+TO+NOW/DAY-36MONTHS]^2=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15=upt_date:[NOW/DAY-3MONTHS+TO+*]^20=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=dlh:(22)^8={!boost+b%3D4}+_query_:{!edismax+qf%3D"currdesig^8+predesig^6+ttl^3+kw_skl^2+contents"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DAND+bq%3D}=_query_:{!edismax+qf%3D"currdesig+predesig+ttl+kw_skl+contents^0.01"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DOR+bq%3D}=NOT+country:isoin^-10=exp:[+10+TO+11+]=exp:[+11+TO+13+]=exp:[+13+TO+15+]=exp:[+15+TO+17+]=exp:[+17+TO+20+]=exp:[+20+TO+25+]=exp:[+25+TO+109+]=ctc:[+100+TO+101+]=ctc:[+101+TO+101.5+]=ctc:[+101.5+TO+102+]=ctc:[+102+TO+103+]=ctc:[+103+TO+104+]=ctc:[+104+TO+105+]=ctc:[+105+TO+107.5+]=ctc:[+107.5+TO+110+]=ctc:[+110+TO+115+]=ctc:[+115+TO+10100+]=1=(22)=javabin=(293)=(294)=(322)=ind=cat=rol=cl=pref=false=1=0=40=((mbbs+OR+_query_:"{!edismax+qf%3Ddlh+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany3+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+((("medical+officer")+OR+"medical+officer"~0)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany0+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("doctor"+OR+doctor)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany2+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("physician"+OR+"physicians"+OR+"general+physician"+OR+"house+physician"+OR+"consultant+physician"+OR+physician)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany1+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+_query_:"{!edismax+qf%3D\$semanticfieldskl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsskl+q.op%3DOR+bq%3D\$bq1+bf%3D}"+OR+_query_:"{!edismax+qf%3D\$semanticfieldttl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsttl+q.op%3DAND+bq%3D\$bq1+bf%3D}")=10=id=kw_skl^0.05+kw_it^0.05+ttl^0.05+currdesig^0.05+predesig^0.05=1=id=id=true} > hits=20268 status=0 QTime=10659 >
query optimization
Hi, How can i optimize following query it is taking time webapp=/solr path=/search params={ df=ttl=0=true=1=true=true=0=0=contents^0.05+currdesig^1.5+predesig^1.5+lng^2+ttl+kw_skl+kw_it=false=ttl,kw_skl,kw_it,contents==1=ttl^0.1+currdesig^0.1+predesig^0.1=0=/resumesearch="mbbss"+OR+"medicine"=2=true=mbbs,+"medical+officer",+doctor,+physician+("medical+officer")+"medical+officer"+"physician""+""general+physician""+""physicians""+""consultant+physician""+""house+physician"+"physician"+"doctor"+"mbbs"+"general+physician"+"physicians"+"consultant+physician"+"house+physician"=(293)=false==none=id,upt=1=OR=NOT+contents:("liaise+with+medical+officer"+"worked+with+medical+officer"+"working+with+medical+officer"+"reported+to+medical+officer"+"references+are+medical+officer"+"coordinated+with+medical+officer"+"closely+with+medical+officer"+"signature+of+medical+officer"+"seal+of++medical+officer"+"liaise+with+physician"+"worked+with+physician"+"working+with+physician"+"reported+to+physician"+"references+are+physician"+"coordinated+with+physician"+"closely+with+physician"+"signature+of+physician"+"seal+of++physician"+"liaise+with+doctor"+"worked+with+doctor"+"working+with+doctor"+"reported+to+doctor"+"references+are+doctor"+"coordinated+with+doctor"+"closely+with+doctor"+"signature+of+doctor"+"seal+of++doctor")=NOT+hemp:("xmwxagency"+"xmwxlimited"+"xmwxplacement"+"xmwxplus"+"xmwxprivate"+"xmwxsecurity"+"xmwxz2"+"xmwxand"+"xswxz2+plus+placement+and+security+agency+private+limited"+"xswxz2+plus+placement+and+security+agency+private"+"xswxz2+plus+placement+and+security+agency"+"xswxz2+plus+placement+and+security"+"xswxz2+plus+placement+and"+"xswxz2+plus+placement"+"xswxz2+plus"+"xswxz2")=ctc:[100.0+TO+107.2]+OR+ctc:[-1.0+TO+-1.0]=(dlh:(22))=ind:(24++42++24++8)=(rol:(292+293+294+322))=(cat:(9))=cat:(1000+OR+907+OR+1+OR+2+OR+3+OR+786+OR+4+OR+5+OR+6+OR+7+OR+8+OR+9+OR+10+OR+11+OR+12+OR+13+OR+14+OR+785+OR+15+OR+16+OR+17+OR+18+OR+908+OR+19+OR+20+OR+21+OR+23+OR+24)=NOT+is_udis:2=is_resume:0^-1000=upt_date:[*+TO+NOW/DAY-36MONTHS]^2=upt_date:[NOW/DAY-36MONTHS+TO+NOW/DAY-24MONTHS]^3=upt_date:[NOW/DAY-24MONTHS+TO+NOW/DAY-12MONTHS]^4=upt_date:[NOW/DAY-12MONTHS+TO+NOW/DAY-9MONTHS]^5=upt_date:[NOW/DAY-9MONTHS+TO+NOW/DAY-6MONTHS]^10=upt_date:[NOW/DAY-6MONTHS+TO+NOW/DAY-3MONTHS]^15=upt_date:[NOW/DAY-3MONTHS+TO+*]^20=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=_query_:"{!edismax+qf%3Drol^2+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$typeId+q.op%3DOR+bq%3D\$bq1+bf%3D}"=dlh:(22)^8={!boost+b%3D4}+_query_:{!edismax+qf%3D"currdesig^8+predesig^6+ttl^3+kw_skl^2+contents"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DAND+bq%3D}=_query_:{!edismax+qf%3D"currdesig+predesig+ttl+kw_skl+contents^0.01"+v%3D"\"doctor\"+\"medical+officer\"+\"physician\""+q.op%3DOR+bq%3D}=NOT+country:isoin^-10=exp:[+10+TO+11+]=exp:[+11+TO+13+]=exp:[+13+TO+15+]=exp:[+15+TO+17+]=exp:[+17+TO+20+]=exp:[+20+TO+25+]=exp:[+25+TO+109+]=ctc:[+100+TO+101+]=ctc:[+101+TO+101.5+]=ctc:[+101.5+TO+102+]=ctc:[+102+TO+103+]=ctc:[+103+TO+104+]=ctc:[+104+TO+105+]=ctc:[+105+TO+107.5+]=ctc:[+107.5+TO+110+]=ctc:[+110+TO+115+]=ctc:[+115+TO+10100+]=1=(22)=javabin=(293)=(294)=(322)=ind=cat=rol=cl=pref=false=1=0=40=((mbbs+OR+_query_:"{!edismax+qf%3Ddlh+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany3+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+((("medical+officer")+OR+"medical+officer"~0)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany0+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("doctor"+OR+doctor)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany2+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+(("physician"+OR+"physicians"+OR+"general+physician"+OR+"house+physician"+OR+"consultant+physician"+OR+physician)+OR+_query_:"{!edismax+qf%3Drol+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D$queryany1+q.op%3DOR+bq%3D$bq1+bf%3D}")+OR+_query_:"{!edismax+qf%3D\$semanticfieldskl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsskl+q.op%3DOR+bq%3D\$bq1+bf%3D}"+OR+_query_:"{!edismax+qf%3D\$semanticfieldttl+pf%3Did+ps%3D1+pf2%3Did+ps2%3D1+pf3%3Did+ps3%3D1+v%3D\$semantictermsttl+q.op%3DAND+bq%3D\$bq1+bf%3D}")=10=id=kw_skl^0.05+kw_it^0.05+ttl^0.05+currdesig^0.05+predesig^0.05=1=id=id=true} hits=20268 status=0 QTime=10659
Re: Query optimization
Ups I forgot the link: http://yonik.com/solr/paging-and-deep-paging/ On Friday, July 29, 2016 9:51 AM, Ahmet Arslanwrote: Hi Midas, Please search 'deep paging' over the documentation, mailing list, etc. Solr Deep Paging and Sorting Ahmet On Friday, July 29, 2016 9:21 AM, Midas A wrote: please reply . On Fri, Jul 29, 2016 at 10:26 AM, Midas A wrote: > a) my index size is 10 gb for higher start is query response got slow . > what should i do to optimize this query for higher start value in query >
Re: Query optimization
Hi Midas, Please search 'deep paging' over the documentation, mailing list, etc. Solr Deep Paging and Sorting Ahmet On Friday, July 29, 2016 9:21 AM, Midas Awrote: please reply . On Fri, Jul 29, 2016 at 10:26 AM, Midas A wrote: > a) my index size is 10 gb for higher start is query response got slow . > what should i do to optimize this query for higher start value in query >
Re: Query optimization
please reply . On Fri, Jul 29, 2016 at 10:26 AM, Midas Awrote: > a) my index size is 10 gb for higher start is query response got slow . > what should i do to optimize this query for higher start value in query >
Query optimization
a) my index size is 10 gb for higher start is query response got slow . what should i do to optimize this query for higher start value in query
Re: Query optimization
Hi , One more thing i would like to add here is we build facet queries over dynamic fields so my question is a) Is there any harm of using docValues true with dynamic fields. b) Other suggestion that we can implement to optimize this query my index size is 8GB and query is taking more tha 3 seconds. Regards, Abhishek Tiwari On Thu, Jul 14, 2016 at 6:42 AM, Erick Ericksonwrote: > DocValues are now the preferred mechanism > whenever you need to sort, facet or group. It'll > make your on-disk index bigger, but the on-disk > structure would have been built in Java's memory > if you didn't use DocValues whereas if you do > it's MMap'd. > > So overall, use DocValues by preference. > > Best, > Erick > > On Wed, Jul 13, 2016 at 5:36 AM, sara hajili > wrote: > > as i know when you use docValue=true > > solr when indexing doc, > > solr although store doc and docValue=true field in memory.to use that in > > facet query and sort query result. > > so maybe use a lot docvalue=true use a lot memory of ur system. > > but use it in logical way.can make better query response time > > > > On Wed, Jul 13, 2016 at 5:11 AM, Midas A wrote: > > > >> Is there any draw back of using docValues=true ? > >> > >> On Wed, Jul 13, 2016 at 2:28 PM, sara hajili > >> wrote: > >> > >> > Hi. > >> > Facet query take a long time.you vcan use group query. > >> > Or in fileds in schema that you run facet query on that filed. > >> > Set doc value=true. > >> > To get better answer.in quick time. > >> > On Jul 13, 2016 11:54 AM, "Midas A" wrote: > >> > > >> > > http:// > >> > > > >> > > > >> > > >> > #:8983/solr/prod/select?q=id_path_ids:166=sort_price:[0%20TO%20*]=status:A=company_status:A=true=1=show_meta_id=show_brand=product_amount_available=by_processor=by_system_memory=by_screen_size=by_operating_system=by_laptop_type=by_processor_brand=by_hard_drive_capacity=by_touchscreen=by_warranty=by_graphic_memory=is_trm=show_merchant=is_cod=show_market={!ex=p_r%20key=product_rating:[4-5]}product_rating:[4%20TO%205]={!ex=p_r%20key=product_rating:[3-5]}product_rating:[3%20TO%205]={!ex=p_r%20key=product_rating:[2-5]}product_rating:[2%20TO%205]={!ex=p_r%20key=product_rating:[1-5]}product_rating:[1%20TO%205]={!ex=m_r%20key=merchant_rating:[4-5]}merchant_rating:[4%20TO%205]={!ex=m_r%20key=merchant_rating:[3-5]}merchant_rating:[3%20TO%205]={!ex=m_r%20key=merchant_rating:[2-5]}merchant_rating:[2%20TO%205]={!ex=m_r%20key=merchant_rating:[1-5]}merchant_rating:[1%20TO%205]=500=true=sort_price=0=10=product_amount_available%20desc,boost_index%20asc,popularity%20desc,is_cod%20desc > >> > > > >> > > > >> > > What kind of optimization we can do in above query . it is taking > 2400 > >> > ms . > >> > > > >> > > >> >
Re: Query optimization
DocValues are now the preferred mechanism whenever you need to sort, facet or group. It'll make your on-disk index bigger, but the on-disk structure would have been built in Java's memory if you didn't use DocValues whereas if you do it's MMap'd. So overall, use DocValues by preference. Best, Erick On Wed, Jul 13, 2016 at 5:36 AM, sara hajiliwrote: > as i know when you use docValue=true > solr when indexing doc, > solr although store doc and docValue=true field in memory.to use that in > facet query and sort query result. > so maybe use a lot docvalue=true use a lot memory of ur system. > but use it in logical way.can make better query response time > > On Wed, Jul 13, 2016 at 5:11 AM, Midas A wrote: > >> Is there any draw back of using docValues=true ? >> >> On Wed, Jul 13, 2016 at 2:28 PM, sara hajili >> wrote: >> >> > Hi. >> > Facet query take a long time.you vcan use group query. >> > Or in fileds in schema that you run facet query on that filed. >> > Set doc value=true. >> > To get better answer.in quick time. >> > On Jul 13, 2016 11:54 AM, "Midas A" wrote: >> > >> > > http:// >> > > >> > > >> > >> #:8983/solr/prod/select?q=id_path_ids:166=sort_price:[0%20TO%20*]=status:A=company_status:A=true=1=show_meta_id=show_brand=product_amount_available=by_processor=by_system_memory=by_screen_size=by_operating_system=by_laptop_type=by_processor_brand=by_hard_drive_capacity=by_touchscreen=by_warranty=by_graphic_memory=is_trm=show_merchant=is_cod=show_market={!ex=p_r%20key=product_rating:[4-5]}product_rating:[4%20TO%205]={!ex=p_r%20key=product_rating:[3-5]}product_rating:[3%20TO%205]={!ex=p_r%20key=product_rating:[2-5]}product_rating:[2%20TO%205]={!ex=p_r%20key=product_rating:[1-5]}product_rating:[1%20TO%205]={!ex=m_r%20key=merchant_rating:[4-5]}merchant_rating:[4%20TO%205]={!ex=m_r%20key=merchant_rating:[3-5]}merchant_rating:[3%20TO%205]={!ex=m_r%20key=merchant_rating:[2-5]}merchant_rating:[2%20TO%205]={!ex=m_r%20key=merchant_rating:[1-5]}merchant_rating:[1%20TO%205]=500=true=sort_price=0=10=product_amount_available%20desc,boost_index%20asc,popularity%20desc,is_cod%20desc >> > > >> > > >> > > What kind of optimization we can do in above query . it is taking 2400 >> > ms . >> > > >> > >>
Re: Query optimization
as i know when you use docValue=true solr when indexing doc, solr although store doc and docValue=true field in memory.to use that in facet query and sort query result. so maybe use a lot docvalue=true use a lot memory of ur system. but use it in logical way.can make better query response time On Wed, Jul 13, 2016 at 5:11 AM, Midas Awrote: > Is there any draw back of using docValues=true ? > > On Wed, Jul 13, 2016 at 2:28 PM, sara hajili > wrote: > > > Hi. > > Facet query take a long time.you vcan use group query. > > Or in fileds in schema that you run facet query on that filed. > > Set doc value=true. > > To get better answer.in quick time. > > On Jul 13, 2016 11:54 AM, "Midas A" wrote: > > > > > http:// > > > > > > > > > #:8983/solr/prod/select?q=id_path_ids:166=sort_price:[0%20TO%20*]=status:A=company_status:A=true=1=show_meta_id=show_brand=product_amount_available=by_processor=by_system_memory=by_screen_size=by_operating_system=by_laptop_type=by_processor_brand=by_hard_drive_capacity=by_touchscreen=by_warranty=by_graphic_memory=is_trm=show_merchant=is_cod=show_market={!ex=p_r%20key=product_rating:[4-5]}product_rating:[4%20TO%205]={!ex=p_r%20key=product_rating:[3-5]}product_rating:[3%20TO%205]={!ex=p_r%20key=product_rating:[2-5]}product_rating:[2%20TO%205]={!ex=p_r%20key=product_rating:[1-5]}product_rating:[1%20TO%205]={!ex=m_r%20key=merchant_rating:[4-5]}merchant_rating:[4%20TO%205]={!ex=m_r%20key=merchant_rating:[3-5]}merchant_rating:[3%20TO%205]={!ex=m_r%20key=merchant_rating:[2-5]}merchant_rating:[2%20TO%205]={!ex=m_r%20key=merchant_rating:[1-5]}merchant_rating:[1%20TO%205]=500=true=sort_price=0=10=product_amount_available%20desc,boost_index%20asc,popularity%20desc,is_cod%20desc > > > > > > > > > What kind of optimization we can do in above query . it is taking 2400 > > ms . > > > > > >
Re: Query optimization
Is there any draw back of using docValues=true ? On Wed, Jul 13, 2016 at 2:28 PM, sara hajiliwrote: > Hi. > Facet query take a long time.you vcan use group query. > Or in fileds in schema that you run facet query on that filed. > Set doc value=true. > To get better answer.in quick time. > On Jul 13, 2016 11:54 AM, "Midas A" wrote: > > > http:// > > > > > #:8983/solr/prod/select?q=id_path_ids:166=sort_price:[0%20TO%20*]=status:A=company_status:A=true=1=show_meta_id=show_brand=product_amount_available=by_processor=by_system_memory=by_screen_size=by_operating_system=by_laptop_type=by_processor_brand=by_hard_drive_capacity=by_touchscreen=by_warranty=by_graphic_memory=is_trm=show_merchant=is_cod=show_market={!ex=p_r%20key=product_rating:[4-5]}product_rating:[4%20TO%205]={!ex=p_r%20key=product_rating:[3-5]}product_rating:[3%20TO%205]={!ex=p_r%20key=product_rating:[2-5]}product_rating:[2%20TO%205]={!ex=p_r%20key=product_rating:[1-5]}product_rating:[1%20TO%205]={!ex=m_r%20key=merchant_rating:[4-5]}merchant_rating:[4%20TO%205]={!ex=m_r%20key=merchant_rating:[3-5]}merchant_rating:[3%20TO%205]={!ex=m_r%20key=merchant_rating:[2-5]}merchant_rating:[2%20TO%205]={!ex=m_r%20key=merchant_rating:[1-5]}merchant_rating:[1%20TO%205]=500=true=sort_price=0=10=product_amount_available%20desc,boost_index%20asc,popularity%20desc,is_cod%20desc > > > > > > What kind of optimization we can do in above query . it is taking 2400 > ms . > > >
Re: Query optimization
Hi. Facet query take a long time.you vcan use group query. Or in fileds in schema that you run facet query on that filed. Set doc value=true. To get better answer.in quick time. On Jul 13, 2016 11:54 AM, "Midas A"wrote: > http:// > > #:8983/solr/prod/select?q=id_path_ids:166=sort_price:[0%20TO%20*]=status:A=company_status:A=true=1=show_meta_id=show_brand=product_amount_available=by_processor=by_system_memory=by_screen_size=by_operating_system=by_laptop_type=by_processor_brand=by_hard_drive_capacity=by_touchscreen=by_warranty=by_graphic_memory=is_trm=show_merchant=is_cod=show_market={!ex=p_r%20key=product_rating:[4-5]}product_rating:[4%20TO%205]={!ex=p_r%20key=product_rating:[3-5]}product_rating:[3%20TO%205]={!ex=p_r%20key=product_rating:[2-5]}product_rating:[2%20TO%205]={!ex=p_r%20key=product_rating:[1-5]}product_rating:[1%20TO%205]={!ex=m_r%20key=merchant_rating:[4-5]}merchant_rating:[4%20TO%205]={!ex=m_r%20key=merchant_rating:[3-5]}merchant_rating:[3%20TO%205]={!ex=m_r%20key=merchant_rating:[2-5]}merchant_rating:[2%20TO%205]={!ex=m_r%20key=merchant_rating:[1-5]}merchant_rating:[1%20TO%205]=500=true=sort_price=0=10=product_amount_available%20desc,boost_index%20asc,popularity%20desc,is_cod%20desc > > > What kind of optimization we can do in above query . it is taking 2400 ms . >
Query optimization
http:// #:8983/solr/prod/select?q=id_path_ids:166=sort_price:[0%20TO%20*]=status:A=company_status:A=true=1=show_meta_id=show_brand=product_amount_available=by_processor=by_system_memory=by_screen_size=by_operating_system=by_laptop_type=by_processor_brand=by_hard_drive_capacity=by_touchscreen=by_warranty=by_graphic_memory=is_trm=show_merchant=is_cod=show_market={!ex=p_r%20key=product_rating:[4-5]}product_rating:[4%20TO%205]={!ex=p_r%20key=product_rating:[3-5]}product_rating:[3%20TO%205]={!ex=p_r%20key=product_rating:[2-5]}product_rating:[2%20TO%205]={!ex=p_r%20key=product_rating:[1-5]}product_rating:[1%20TO%205]={!ex=m_r%20key=merchant_rating:[4-5]}merchant_rating:[4%20TO%205]={!ex=m_r%20key=merchant_rating:[3-5]}merchant_rating:[3%20TO%205]={!ex=m_r%20key=merchant_rating:[2-5]}merchant_rating:[2%20TO%205]={!ex=m_r%20key=merchant_rating:[1-5]}merchant_rating:[1%20TO%205]=500=true=sort_price=0=10=product_amount_available%20desc,boost_index%20asc,popularity%20desc,is_cod%20desc What kind of optimization we can do in above query . it is taking 2400 ms .
Filter query optimization
If a filter query matches nothing, then no additional query should be performed and no results returned? I don't think we have this today?
Re: Filter query optimization
Yonik, this is a fast operation anyway Can you elaborate on why this is a fast operation? Basically there's a distributed query with a filter, where on a number of the servers, the filter query isn't matching anything, however I'm seeing load on those servers (where nothing matches), so I'm assuming the filter is generated (and cached) which is fine, then the user query is being performed on a filter where no documents match. I could misinterpreting the data, however, I want to find out about this use case regardless as it likely will crop up again for us. -J On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: If a filter query matches nothing, then no additional query should be performed and no results returned? I don't think we have this today? No, but this is a fast operation anyway (In Solr 1.4 at least). Another thing to watch out for is to not try this with filters that you don't know the size of (or else you may force a popcount on a BitDocSet that would not otherwise have been needed). It could also potentially complicate warming queries - need to be careful that the combination of filters you are warming with matches something, or it would cause the fieldCache entries to not be populated. -Yonik http://www.lucidimagination.com
Re: Filter query optimization
On Mon, Oct 19, 2009 at 4:45 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yonik, this is a fast operation anyway Can you elaborate on why this is a fast operation? The scorers will never really be used. The query will be weighted and scorers will be created, but the filter will be checked first and return NO_MORE_DOCS. -Yonik http://www.lucidimagination.com Basically there's a distributed query with a filter, where on a number of the servers, the filter query isn't matching anything, however I'm seeing load on those servers (where nothing matches), so I'm assuming the filter is generated (and cached) which is fine, then the user query is being performed on a filter where no documents match. I could misinterpreting the data, however, I want to find out about this use case regardless as it likely will crop up again for us. -J On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: If a filter query matches nothing, then no additional query should be performed and no results returned? I don't think we have this today? No, but this is a fast operation anyway (In Solr 1.4 at least). Another thing to watch out for is to not try this with filters that you don't know the size of (or else you may force a popcount on a BitDocSet that would not otherwise have been needed). It could also potentially complicate warming queries - need to be careful that the combination of filters you are warming with matches something, or it would cause the fieldCache entries to not be populated. -Yonik http://www.lucidimagination.com
Re: Filter query optimization
Ok, thanks, new Lucene 2.9 features. On Mon, Oct 19, 2009 at 2:33 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Oct 19, 2009 at 4:45 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yonik, this is a fast operation anyway Can you elaborate on why this is a fast operation? The scorers will never really be used. The query will be weighted and scorers will be created, but the filter will be checked first and return NO_MORE_DOCS. -Yonik http://www.lucidimagination.com Basically there's a distributed query with a filter, where on a number of the servers, the filter query isn't matching anything, however I'm seeing load on those servers (where nothing matches), so I'm assuming the filter is generated (and cached) which is fine, then the user query is being performed on a filter where no documents match. I could misinterpreting the data, however, I want to find out about this use case regardless as it likely will crop up again for us. -J On Mon, Oct 19, 2009 at 12:07 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Oct 19, 2009 at 2:55 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: If a filter query matches nothing, then no additional query should be performed and no results returned? I don't think we have this today? No, but this is a fast operation anyway (In Solr 1.4 at least). Another thing to watch out for is to not try this with filters that you don't know the size of (or else you may force a popcount on a BitDocSet that would not otherwise have been needed). It could also potentially complicate warming queries - need to be careful that the combination of filters you are warming with matches something, or it would cause the fieldCache entries to not be populated. -Yonik http://www.lucidimagination.com
Re: Search query optimization
If I know that condition C will eliminate more results than either A or B, does specifying the query as: C AND A AND B make it any faster (than the original A AND B AND C)? -- View this message in context: http://www.nabble.com/Search-query-optimization-tp17544667p18205504.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search query optimization
: If I know that condition C will eliminate more results than either A or B, : does specifying the query as: C AND A AND B make it any faster (than the : original A AND B AND C)? Nope. Lucene takes care of that for you. -Hoss
RE: Search query optimization
Hi, Thanks for your reply. I did some test on my test machine. http://stage.boomi.com:8080/solr/select/?q=account:1rows=1000. It will return resultset 384 in 3ms. If I add a new AND condition as below: http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddate_dt :[NOW/DAYS-7DAYS+TO+NOW]rows=1000. It will take 18236 to return 21 resultset. If I only use the recordedate_dt condition like http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS-7DA YS+TO+NOW]rows=1000. It takes 20271 ms to get 412800 results. All the above URL are live, you test it. Can anyone give me some explaination why this happens if we have the query optimization? Thank you very much. Yongjun Rong -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29, 2008 4:57 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization The people working on Lucene are pretty smart, and this sort of query optimization is a well-known trick, so I would not worry about it. A dozen years ago at Infoseek, we checked the count of matches for each term in an AND, and evaluated the smallest one first. If any of them had zero matches, we didn't evaluate any of them. I expect that Doug Cutting and the other Lucene folk know those same tricks. wunder On 5/29/08 1:50 PM, Yongjun Rong [EMAIL PROTECTED] wrote: Hi Yonik, Thanks for your quick reply. I'm very new to the lucene source code. Can you give me a little more detail explaination about this. Do you think it will save some memory if docnum = find_match(A) docnum = find_match(B) and put B in the front of the AND query like B AND A AND C? How about sorting (sort=A,B,Cq=A AND B AND C)? Do you think the order of conditions (A,B,C) in a query will affect the performance of the query? Thank you very much. Yongjun -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, May 29, 2008 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong [EMAIL PROTECTED] wrote: I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? No, Lucene will try and optimize this the best it can. It roughly goes like this.. docnum = find_match(A) docnum = find_first_match_after(docnum, B) docnum = find_first_match_after(docnum,C) etc... until the same docnum is returned for A,B, and C. See ConjunctionScorer for the gritty details. -Yonik or only the documents satisfying A AND B AND C will be put into memory? Is there any articles discuss about how to build a optimization query to save memory and improve performance? Thank you very much. Yongjun Rong
RE: Search query optimization
Thanks for reply. Here is the debugQuery output: lst name=debug − str name=rawquerystring account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW] /str − str name=querystring account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW] /str − str name=parsedquery +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000Z TO 2008-06-17T17:07:57.420Z] /str − str name=parsedquery_toString +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000 TO 2008-06-17T17:07:57.420] /str − lst name=explain − str name=id=e03dbd92-3d41-4693-8b69-ac9a0d332446-atom-d52484f5-7aa8-40b3-ad6f-ba3a9071999e,internal_docid=6515410 10.88071 = (MATCH) sum of: 10.788804 = (MATCH) weight(account:1 in 6515410), product of: 0.9957678 = queryWeight(account:1), product of: 10.834659 = idf(docFreq=348, numDocs=6515640) 0.09190578 = queryNorm 10.834659 = (MATCH) fieldWeight(account:1 in 6515410), product of: 1.0 = tf(termFreq(account:1)=1) 10.834659 = idf(docFreq=348, numDocs=6515640) 1.0 = fieldNorm(field=account, doc=6515410) 0.09190578 = (MATCH) ConstantScoreQuery(recordeddate_dt:[2008-06-16T00:00:00.000-2008-06-17T17:07:57.420]), product of: 1.0 = boost 0.09190578 = queryNorm /str /lst /lst -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization Hi, Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as lots of OR clauses. I think that you'll see that if you add debugQuery=true to the URL. Make sure your recorded_date_dt is not too granular (e.g. if you don't need minutes, round the values to hours. If you don't need hours, round the values to days). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yongjun Rong [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, June 17, 2008 11:56:06 AM Subject: RE: Search query optimization Hi, Thanks for your reply. I did some test on my test machine. http://stage.boomi.com:8080/solr/select/?q=account:1rows=1000. It will return resultset 384 in 3ms. If I add a new AND condition as below: http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddate_ dt :[NOW/DAYS-7DAYS+TO+NOW]rows=1000. It will take 18236 to return 21 resultset. If I only use the recordedate_dt condition like http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS-7 DA YS+TO+NOW]rows=1000. It takes 20271 ms to get 412800 results. All the above URL are live, you test it. Can anyone give me some explaination why this happens if we have the query optimization? Thank you very much. Yongjun Rong -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29, 2008 4:57 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization The people working on Lucene are pretty smart, and this sort of query optimization is a well-known trick, so I would not worry about it. A dozen years ago at Infoseek, we checked the count of matches for each term in an AND, and evaluated the smallest one first. If any of them had zero matches, we didn't evaluate any of them. I expect that Doug Cutting and the other Lucene folk know those same tricks. wunder On 5/29/08 1:50 PM, Yongjun Rong wrote: Hi Yonik, Thanks for your quick reply. I'm very new to the lucene source code. Can you give me a little more detail explaination about this. Do you think it will save some memory if docnum = find_match(A) docnum = find_match(B) and put B in the front of the AND query like B AND A AND C? How about sorting (sort=A,B,Cq=A AND B AND C)? Do you think the order of conditions (A,B,C) in a query will affect the performance of the query? Thank you very much. Yongjun -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, May 29, 2008 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong wrote: I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? No, Lucene will try and optimize this the best it can. It roughly goes like this.. docnum = find_match(A) docnum = find_first_match_after(docnum, B) docnum = find_first_match_after(docnum,C) etc... until the same docnum is returned for A,B, and C. See ConjunctionScorer for the gritty details. -Yonik or only the documents satisfying A AND B AND C will be put into memory? Is there any articles discuss about how to build a optimization query to save memory and improve performance? Thank you very much
Re: Search query optimization
Hi, This is what I was talking about: recordeddate_dt:[2008-06-16T00:00:00.000Z TO 2008-06-17T17:07:57.420Z] Note that the granularity of this date field is down to milliseconds. You should change that to be more coarse if you don't need such precision (e.g. no milliseconds, no seconds, no minutes, no hours...) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yongjun Rong [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, June 17, 2008 1:09:19 PM Subject: RE: Search query optimization Thanks for reply. Here is the debugQuery output: − account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW] − account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW] − +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000Z TO 2008-06-17T17:07:57.420Z] − +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000 TO 2008-06-17T17:07:57.420] − − name=id=e03dbd92-3d41-4693-8b69-ac9a0d332446-atom-d52484f5-7aa8-40b3-ad6f-ba3a9071999e,internal_docid=6515410 10.88071 = (MATCH) sum of: 10.788804 = (MATCH) weight(account:1 in 6515410), product of: 0.9957678 = queryWeight(account:1), product of: 10.834659 = idf(docFreq=348, numDocs=6515640) 0.09190578 = queryNorm 10.834659 = (MATCH) fieldWeight(account:1 in 6515410), product of: 1.0 = tf(termFreq(account:1)=1) 10.834659 = idf(docFreq=348, numDocs=6515640) 1.0 = fieldNorm(field=account, doc=6515410) 0.09190578 = (MATCH) ConstantScoreQuery(recordeddate_dt:[2008-06-16T00:00:00.000-2008-06-17T17:07:57.420]), product of: 1.0 = boost 0.09190578 = queryNorm -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization Hi, Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as lots of OR clauses. I think that you'll see that if you add debugQuery=true to the URL. Make sure your recorded_date_dt is not too granular (e.g. if you don't need minutes, round the values to hours. If you don't need hours, round the values to days). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yongjun Rong To: solr-user@lucene.apache.org Sent: Tuesday, June 17, 2008 11:56:06 AM Subject: RE: Search query optimization Hi, Thanks for your reply. I did some test on my test machine. http://stage.boomi.com:8080/solr/select/?q=account:1rows=1000. It will return resultset 384 in 3ms. If I add a new AND condition as below: http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddate_ dt :[NOW/DAYS-7DAYS+TO+NOW]rows=1000. It will take 18236 to return 21 resultset. If I only use the recordedate_dt condition like http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS-7 DA YS+TO+NOW]rows=1000. It takes 20271 ms to get 412800 results. All the above URL are live, you test it. Can anyone give me some explaination why this happens if we have the query optimization? Thank you very much. Yongjun Rong -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29, 2008 4:57 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization The people working on Lucene are pretty smart, and this sort of query optimization is a well-known trick, so I would not worry about it. A dozen years ago at Infoseek, we checked the count of matches for each term in an AND, and evaluated the smallest one first. If any of them had zero matches, we didn't evaluate any of them. I expect that Doug Cutting and the other Lucene folk know those same tricks. wunder On 5/29/08 1:50 PM, Yongjun Rong wrote: Hi Yonik, Thanks for your quick reply. I'm very new to the lucene source code. Can you give me a little more detail explaination about this. Do you think it will save some memory if docnum = find_match(A) docnum = find_match(B) and put B in the front of the AND query like B AND A AND C? How about sorting (sort=A,B,Cq=A AND B AND C)? Do you think the order of conditions (A,B,C) in a query will affect the performance of the query? Thank you very much. Yongjun -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, May 29, 2008 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong wrote: I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? No, Lucene will try and optimize
Re: Search query optimization
: Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as lots : of OR clauses. I think that you'll see that if you add debugQuery=true : to the URL. Make sure your recorded_date_dt is not too granular (e.g. : if you don't need minutes, round the values to hours. If you don't need : hours, round the values to days). for the record: it doesn't get rewritten to a lot of OR clauses, it's using ConstantScoreRangeQuery. granularity is definitely important however, bth when indexing and when querying. NOW is milliseconds, so every time you execute that query it's different and there is almost no caching possible. if you use [NOW/DAY-7DAYS TO NOW/DAY] or even [NOW/DAY-7DAYS TO NOW/HOUR] you'll get a lot better caching behavior. it looks like you are trying to find anything in the past week, so you may want [NOW/DAY-7DAYS TO NOW/DAY+1DAY] (to go to the end of the current day) once you have a less granular date restriction, it can frequently make sense to put this in a seperate fq clause, so it will get cached independently of your main query. But Otis's point about reducing granularity can also help when indexing ... the fewer unique dates that apepar in your index, the faster range queries will be ... if you've got 1000 documents that all of a recordeddate of June 11 2008, but at different times, and you're never going to care aboutthe times (just the date) then strip those times off when indexing so they all have the same fieled value of 2008-06-11T00:00:00Z BTW: the solr port you sent out a URL to ... all of it's caching is turned off (the filterCache and queryResultCache configs are commented out of your solrconfig.xml) ... you're going to wnat to turn on some caching or you'll never see really *great* request times. -Hoss
RE: Search query optimization
Hi Otis, Thanks for your advice. Do you mean when we add the date data we need carefully select the granularity of the date field to make sure it is more coarse? How can we do this? We just access solr via http URL not API. If you talk about the query syntax, we do have NOW/DAY as round to DAY. Yongjun Rong -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 1:32 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization Hi, This is what I was talking about: recordeddate_dt:[2008-06-16T00:00:00.000Z TO 2008-06-17T17:07:57.420Z] Note that the granularity of this date field is down to milliseconds. You should change that to be more coarse if you don't need such precision (e.g. no milliseconds, no seconds, no minutes, no hours...) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yongjun Rong [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, June 17, 2008 1:09:19 PM Subject: RE: Search query optimization Thanks for reply. Here is the debugQuery output: − account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW] − account:1 AND recordeddate_dt:[NOW/DAYS-1DAYS TO NOW] − +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000Z TO 2008-06-17T17:07:57.420Z] − +account:1 +recordeddate_dt:[2008-06-16T00:00:00.000 TO +2008-06-17T17:07:57.420] − − name=id=e03dbd92-3d41-4693-8b69-ac9a0d332446-atom-d52484f5-7aa8-40b3- ad6f-ba3a9071999e,internal_docid=6515410 10.88071 = (MATCH) sum of: 10.788804 = (MATCH) weight(account:1 in 6515410), product of: 0.9957678 = queryWeight(account:1), product of: 10.834659 = idf(docFreq=348, numDocs=6515640) 0.09190578 = queryNorm 10.834659 = (MATCH) fieldWeight(account:1 in 6515410), product of: 1.0 = tf(termFreq(account:1)=1) 10.834659 = idf(docFreq=348, numDocs=6515640) 1.0 = fieldNorm(field=account, doc=6515410) 0.09190578 = (MATCH) ConstantScoreQuery(recordeddate_dt:[2008-06-16T00:00:00.000-2008-06-17 T17:07:57.420]), product of: 1.0 = boost 0.09190578 = queryNorm -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization Hi, Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as lots of OR clauses. I think that you'll see that if you add debugQuery=true to the URL. Make sure your recorded_date_dt is not too granular (e.g. if you don't need minutes, round the values to hours. If you don't need hours, round the values to days). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Yongjun Rong To: solr-user@lucene.apache.org Sent: Tuesday, June 17, 2008 11:56:06 AM Subject: RE: Search query optimization Hi, Thanks for your reply. I did some test on my test machine. http://stage.boomi.com:8080/solr/select/?q=account:1rows=1000. It will return resultset 384 in 3ms. If I add a new AND condition as below: http://stage.boomi.com:8080/solr/select/?q=account:1+AND+recordeddat e_ dt :[NOW/DAYS-7DAYS+TO+NOW]rows=1000. It will take 18236 to return 21 resultset. If I only use the recordedate_dt condition like http://stage.boomi.com:8080/solr/select/?q=recordeddate_dt:[NOW/DAYS -7 DA YS+TO+NOW]rows=1000. It takes 20271 ms to get 412800 results. All YS+TO+the above URL are live, you test it. Can anyone give me some explaination why this happens if we have the query optimization? Thank you very much. Yongjun Rong -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Thursday, May 29, 2008 4:57 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization The people working on Lucene are pretty smart, and this sort of query optimization is a well-known trick, so I would not worry about it. A dozen years ago at Infoseek, we checked the count of matches for each term in an AND, and evaluated the smallest one first. If any of them had zero matches, we didn't evaluate any of them. I expect that Doug Cutting and the other Lucene folk know those same tricks. wunder On 5/29/08 1:50 PM, Yongjun Rong wrote: Hi Yonik, Thanks for your quick reply. I'm very new to the lucene source code. Can you give me a little more detail explaination about this. Do you think it will save some memory if docnum = find_match(A) docnum = find_match(B) and put B in the front of the AND query like B AND A AND C? How about sorting (sort=A,B,Cq=A AND B AND C)? Do you think the order of conditions (A,B,C) in a query will affect the performance of the query? Thank you very much. Yongjun -Original Message- From: [EMAIL
RE: Search query optimization
Hi Chris, Thanks for your suggestions. I did try the [NOW/DAY-7DAYS TO NOW/DAY], but it is not better. And I tried [NOW/DAY-7DAYS TO NOW/DAY+1DAY], I got some exception as below: org.apache.solr.core.SolrException: Query parsing error: Cannot parse 'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]': Encountered 1DAY at line 1, column 57. Was expecting: ] ... at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:104) at org.apache.solr.request.StandardRequestHandler.handleRequestBody(Standar dRequestHandler.java:109) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:66) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan dler.java:1093) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:185) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan dler.java:1084) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 16) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:726) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:206) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav a:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne ction.java:828) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java: 395) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja va:450) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]': Encountered 1DAY at line 1, column 57. Was expecting: ] ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:152) at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:94) ... 26 more And I will try to open the cache and see if I can get better query time. I will let you know. Thank you very much. Yongjun Rong -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 1:55 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization : Probably because the [NOW/DAYS-7DAYS+TO+NOW] part gets rewritten as lots : of OR clauses. I think that you'll see that if you add debugQuery=true : to the URL. Make sure your recorded_date_dt is not too granular (e.g. : if you don't need minutes, round the values to hours. If you don't need : hours, round the values to days). for the record: it doesn't get rewritten to a lot of OR clauses, it's using ConstantScoreRangeQuery. granularity is definitely important however, bth when indexing and when querying. NOW is milliseconds, so every time you execute that query it's different and there is almost no caching possible. if you use [NOW/DAY-7DAYS TO NOW/DAY] or even [NOW/DAY-7DAYS TO NOW/HOUR] you'll get a lot better caching behavior. it looks like you are trying to find anything in the past week, so you may want [NOW/DAY-7DAYS TO NOW/DAY+1DAY] (to go to the end of the current day) once you have a less granular date restriction, it can frequently make sense to put this in a seperate fq clause, so it will get cached independently of your main query. But Otis's point about reducing granularity can also help when indexing ... the fewer unique dates that apepar in your index, the faster range queries will be ... if you've got 1000 documents that all of a recordeddate of June 11 2008, but at different times, and you're never going to care aboutthe times (just the date) then strip those times off when indexing so they all have the same fieled value of 2008-06-11T00:00:00Z BTW: the solr port you sent out a URL to ... all of it's caching is turned off (the filterCache
RE: Search query optimization
:Thanks for your suggestions. I did try the [NOW/DAY-7DAYS TO : NOW/DAY], but it is not better. And I tried [NOW/DAY-7DAYS TO : NOW/DAY+1DAY], I got some exception as below: : org.apache.solr.core.SolrException: Query parsing error: Cannot parse : 'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]': : Encountered 1DAY at line 1, column 57. you need to propertly URL escape the + character as %2B in your URLs. : And I will try to open the cache and see if I can get better query time. the first request won't be any faster. but the second request will be. and if filtering by week is something you expect peopel to do a lot of, you can put it in a newSearcher so it's always warmed up and fast for everyone. -Hoss
RE: Search query optimization
Hi Chris, Thank you very much for the detail suggestions. I just did the cache test. If most of requests return the same set of data, cache will improve the query performance. But in our usage, almost all requests have different data set to return. The cache hit ratio is very low. That's the reason we close the cache for memory saving. Another question is: q=account:1+AND+recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] will combine the resultset of account:1 and recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. How lucene handle it? From my previous test examples, it seems lucene will not check the size of the subconditions (like account:1 or recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]). Q=account:1 will return a small set of data. But q=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] will return a large set of data. If we combine them with AND like: q=account+AND+recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. It should return the small set of data and then apply the subcondition recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. But from the response time, it seems not the case. Can anyone give me some detail explaination about this? Thank you very much. Yongjun Rong -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 2:32 PM To: solr-user@lucene.apache.org Subject: RE: Search query optimization :Thanks for your suggestions. I did try the [NOW/DAY-7DAYS TO : NOW/DAY], but it is not better. And I tried [NOW/DAY-7DAYS TO : NOW/DAY+1DAY], I got some exception as below: : org.apache.solr.core.SolrException: Query parsing error: Cannot parse : 'account:1 AND recordeddate_dt:[NOW/DAYS-7DAYS TO NOW/DAY 1DAY]': : Encountered 1DAY at line 1, column 57. you need to propertly URL escape the + character as %2B in your URLs. : And I will try to open the cache and see if I can get better query time. the first request won't be any faster. but the second request will be. and if filtering by week is something you expect peopel to do a lot of, you can put it in a newSearcher so it's always warmed up and fast for everyone. -Hoss
RE: Search query optimization
: test. If most of requests return the same set of data, cache will : improve the query performance. But in our usage, almost all requests : have different data set to return. The cache hit ratio is very low. that's hwy i suggested moving clauses that are likely to be common (ie: your within the last week clause into a seperate fq param where it can be cached independently from the main query. if you do that *and* you have the filterCache turned on then after this query... q=account:1fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] ...these other queries will all be fairly fast becauseo f hte cache hit... q=account:fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] q=account:fq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] q=anything+you+wantfq=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] : my previous test examples, it seems lucene will not check the size of : the subconditions (like account:1 or : recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]). Q=account:1 will return a : small set of data. But q=recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY] will : return a large set of data. If we combine them with AND like: : q=account+AND+recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. It should : return the small set of data and then apply the subcondition : recordeddate_dt:[NOW/DAY-7DAYS+TO+NOW/DAY]. But from the response the ConjunctionScorer will do that (as mentioned earlier in this thread) but even if the account:1 clause indicates that it can skip ahead to *document* #1234567, the ConstantScoreRangeQuery still needs iterate over all of the *terms* in the specified range before it knows which the lowest matching doc id is above #1234567. that's why putting range queries into seperate fq params can be a lot better ... that term iteration only needs to be done once and can then be cached and reused. -Hoss
Search query optimization
Hi, I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? or only the documents satisfying A AND B AND C will be put into memory? Is there any articles discuss about how to build a optimization query to save memory and improve performance? Thank you very much. Yongjun Rong
Re: Search query optimization
On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong [EMAIL PROTECTED] wrote: I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? No, Lucene will try and optimize this the best it can. It roughly goes like this.. docnum = find_match(A) docnum = find_first_match_after(docnum, B) docnum = find_first_match_after(docnum,C) etc... until the same docnum is returned for A,B, and C. See ConjunctionScorer for the gritty details. -Yonik or only the documents satisfying A AND B AND C will be put into memory? Is there any articles discuss about how to build a optimization query to save memory and improve performance? Thank you very much. Yongjun Rong
RE: Search query optimization
Hi Yonik, Thanks for your quick reply. I'm very new to the lucene source code. Can you give me a little more detail explaination about this. Do you think it will save some memory if docnum = find_match(A) docnum = find_match(B) and put B in the front of the AND query like B AND A AND C? How about sorting (sort=A,B,Cq=A AND B AND C)? Do you think the order of conditions (A,B,C) in a query will affect the performance of the query? Thank you very much. Yongjun -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, May 29, 2008 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong [EMAIL PROTECTED] wrote: I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? No, Lucene will try and optimize this the best it can. It roughly goes like this.. docnum = find_match(A) docnum = find_first_match_after(docnum, B) docnum = find_first_match_after(docnum,C) etc... until the same docnum is returned for A,B, and C. See ConjunctionScorer for the gritty details. -Yonik or only the documents satisfying A AND B AND C will be put into memory? Is there any articles discuss about how to build a optimization query to save memory and improve performance? Thank you very much. Yongjun Rong
Re: Search query optimization
The people working on Lucene are pretty smart, and this sort of query optimization is a well-known trick, so I would not worry about it. A dozen years ago at Infoseek, we checked the count of matches for each term in an AND, and evaluated the smallest one first. If any of them had zero matches, we didn't evaluate any of them. I expect that Doug Cutting and the other Lucene folk know those same tricks. wunder On 5/29/08 1:50 PM, Yongjun Rong [EMAIL PROTECTED] wrote: Hi Yonik, Thanks for your quick reply. I'm very new to the lucene source code. Can you give me a little more detail explaination about this. Do you think it will save some memory if docnum = find_match(A) docnum = find_match(B) and put B in the front of the AND query like B AND A AND C? How about sorting (sort=A,B,Cq=A AND B AND C)? Do you think the order of conditions (A,B,C) in a query will affect the performance of the query? Thank you very much. Yongjun -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, May 29, 2008 4:12 PM To: solr-user@lucene.apache.org Subject: Re: Search query optimization On Thu, May 29, 2008 at 4:05 PM, Yongjun Rong [EMAIL PROTECTED] wrote: I have a question about how the lucene query parser. For example, I have query A AND B AND C. Will lucene extract all documents satisfy condition A in memory and then filter it with condition B and C? No, Lucene will try and optimize this the best it can. It roughly goes like this.. docnum = find_match(A) docnum = find_first_match_after(docnum, B) docnum = find_first_match_after(docnum,C) etc... until the same docnum is returned for A,B, and C. See ConjunctionScorer for the gritty details. -Yonik or only the documents satisfying A AND B AND C will be put into memory? Is there any articles discuss about how to build a optimization query to save memory and improve performance? Thank you very much. Yongjun Rong