graph traversal filter which uses document value in the query

2021-03-04 Thread Lee Carroll
Hi All,
I'm using the graph query parser to traverse a set of edge documents. An
edge looks like

"id":"edge1", "recordType":"journey", "Date":"2021-03-04T00:00:00Z", "Origin
":"AAC", "OriginLocalDateTime":"2021-03-04T05:00:00Z", "Destination":"AAB",
"DestinationLocalDateTime":"2021-03-04T07:00:00Z"

I'd like to collect  journeys needed to travel from an origin city to a
destination city in a single hop (a-b-c) where all journeys are made on the
same day. I'm using a traversal filter to achieve this on the same day
criteria but the function field parameter which I'm expecting to return the
document's date value is being ignored
For example a query to get all journeys from AAA to AAB is:

q={!graph
   maxDepth=1
   from=Origin
   to=Destination
traversalFilter='Date:{!func}Date'
} Origin:AAA  & fq= DestinationAirportCode:AAB || originAirportCode:AAA

What is the correct approach for this problem?

Cheers Lee C


Graph query from A to X[n] when number of hops is not known

2021-03-03 Thread Sravani Kambhampati
Hi,

How to graph query from A to X where number of hops is not known, but when 
graph query for each hop remains same.


For example:
If my graph looks like this,
id:A -> pk:A1 -> tgt:A2
id:B -> pk:B1 -> tgt:B2
...
id:X

To get from A to B,

  1.  We query A to A2 using (id->pk) + (pk -> tgt) {!graph from=tgt 
to=pk}{!graph from=pk to=id}id:A
  2.  Then from A2 to B using (tgt -> id) {!graph from=id to=tgt}


To get from A to C, steps 1 and 2 will be repeated:
{!graph from=id to=tgt}{!graph from=tgt to=pk}{!graph from=pk to=id}{!graph 
from=id to=tgt}{!graph from=tgt to=pk}{!graph from=pk to=id}id:A

Likewise, given a start node A, is it possible to query for X when number of 
hops is unknown, but when query is same for every hop?

Thanks,
Sravani


Query response time long for dynamicField in Solr 6.1.0

2021-03-02 Thread vishal patel
I am using Solr 6.1.0. We have 2 shards and each has one replica.

My schema field is below in one collection


When I execute below query It is taking more than 180 milliseconds every time.
http://10.38.33.24:8983/solr/forms/select?q=project_id:(2117627+2102977+2109667+2102912+2113720+2102976+2102478+2114939+2101443+2123237+2078189+2086596+2079707+2079706+2079705+2079658+2088340+2088338+2113641+2117131+2117672+2120870+2079708+2113718+2096308+2125462+2117837+2115406+2123865+2081232+2080746+2081239+2082706+2098700+2103039+2098699+2082878+2082877+2079994+2113719+2107255+2103251+2100558+2112735+2100036+2100037+2115359+2099330+2112101+2115360+2112070+2125140+2103656+2090184+2090183+2088269+2088270+2115358+2113036+2096855+2098258+2097226+2097225+2113127+2102847+2081187+2082817+2085678+2085677+2100937+2116632+2117133+2121028+2102479+2080006+2117509+2091443+2094716+2109780+2109779+2102735+2102736+2102685+2101923+2103648+2102608+2102480+2103664+2079205+2075380+2079206+2091442+2088614+2088613+2079876+2079875+2082886+2088615+2079429+2079428+2117185+2082859+2082860+2125270+2081301+2117623+2112740+2086757+2086756+2101344+2086597+2086847+2102648+2113362+2109010+2100223+2079877+2082704+2109669+2103649+2100744+2101490+2117526+2117134+2124020+2124021+2123524+2127200+2125039+2103663)=updated+desc,id+desc=0=30==id,form_id,project_id,doctype,dc,form_type_id,status_id,originator_user_id,controller_user_id,form_num,originator_proxy_user_id,originator_user_type_id,controller_user_type_id,msg_id,msg_originator_id,msg_status_id,parent_msg_id,msg_type_id,msg_code,form_code,appType,instance_group_id,bim_model_id,is_draft,InvoiceColourCode,InvoiceCountAgainstOrder,msg_content,msg_content1,msg_content3,user_ref,form_type_name,form_group_name,observationId,locationId,pf_loc_folderId,hasFormAssociation,hasCommentAssociation,hasDocAssociation,hasBimViewAssociation,hasBimListAssociation,originator_org_id,form_closeby_date,form_creation_date,status_change_userId,status_update_date,lastmodified,is_public,title,*Start_Date,*Tender_End_Date,*Tender_End_Time,*Tender_Review_Date,*Tender_Review_Time,*TenderEndDatePassed,*Package_Description,*Budget,*Currency_Sign,*allowExternalVendor,*Enable_form_public_link,*Is_Tender_Public=off=true=http://10.38.33.24:8983/solr/forms,http://10.38.33.227:8983/solr/forms=true=form_id=msg_creation_date+desc=true

When I execute below query It is taking less than 80 milliseconds every time.
http://10.38.33.24:8983/solr/forms/select?q=project_id:(2117627+2102977+2109667+2102912+2113720+2102976+2102478+2114939+2101443+2123237+2078189+2086596+2079707+2079706+2079705+2079658+2088340+2088338+2113641+2117131+2117672+2120870+2079708+2113718+2096308+2125462+2117837+2115406+2123865+2081232+2080746+2081239+2082706+2098700+2103039+2098699+2082878+2082877+2079994+2113719+2107255+2103251+2100558+2112735+2100036+2100037+2115359+2099330+2112101+2115360+2112070+2125140+2103656+2090184+2090183+2088269+2088270+2115358+2113036+2096855+2098258+2097226+2097225+2113127+2102847+2081187+2082817+2085678+2085677+2100937+2116632+2117133+2121028+2102479+2080006+2117509+2091443+2094716+2109780+2109779+2102735+2102736+2102685+2101923+2103648+2102608+2102480+2103664+2079205+2075380+2079206+2091442+2088614+2088613+2079876+2079875+2082886+2088615+2079429+2079428+2117185+2082859+2082860+2125270+2081301+2117623+2112740+2086757+2086756+2101344+2086597+2086847+2102648+2113362+2109010+2100223+2079877+2082704+2109669+2103649+2100744+2101490+2117526+2117134+2124020+2124021+2123524+2127200+2125039+2103663)=updated+desc,id+desc=0=30==id,form_id,project_id,doctype,dc,form_type_id,status_id,originator_user_id,controller_user_id,form_num,originator_proxy_user_id,originator_user_type_id,controller_user_type_id,msg_id,msg_originator_id,msg_status_id,parent_msg_id,msg_type_id,msg_code,form_code,appType,instance_group_id,bim_model_id,is_draft,InvoiceColourCode,InvoiceCountAgainstOrder,msg_content,msg_content1,msg_content3,user_ref,form_type_name,form_group_name,observationId,locationId,pf_loc_folderId,hasFormAssociation,hasCommentAssociation,hasDocAssociation,hasBimViewAssociation,hasBimListAssociation,originator_org_id,form_closeby_date,form_creation_date,status_change_userId,status_update_date,lastmodified,is_public,title,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Start_Date,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Tender_End_Date,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Tender_End_Time,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Tender_Review_Date,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Tender_Review_Time,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/TenderEndDatePassed,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Package_Description,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/Budget,/myFields/FORM_CUSTOM_FIELDS/ORI_MSG_Custom_Fields/Currency_Sign,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity/allowExternalVendor,/myFields/FORM_CUSTOM_FIELDS/ORI_MSG_Custom_Fields/Enable_form_public_link,/myFields/FORM_CUSTOM_FIELDS/Bid_Opportunity

RE: Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
It worked! Thanks Mr. Rafalovitch. I just removed “type”: “query”.. keys from 
the json, and used indexAnalyzer and queryAnalyzer in place of analyzer json 
node.

Sent from Mail for Windows 10

From: Alexandre Rafalovitch
Sent: 03 March 2021 01:19
To: solr-user
Subject: Re: Schema API specifying different analysers for query and index

RefGuide gives this for Adding, I would hope the Replace would be similar:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type":{
 "name":"myNewTextField",
 "class":"solr.TextField",
 "indexAnalyzer":{
"tokenizer":{
   "class":"solr.PathHierarchyTokenizerFactory",
   "delimiter":"/" }},
 "queryAnalyzer":{
"tokenizer":{
   "class":"solr.KeywordTokenizerFactory" }}}
}' http://localhost:8983/solr/gettingstarted/schema

So, indexAnalyzer/queryAnalyzer, rather than array:
https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type

Hope this works,
Alex.
P.s. Also check whether you are using matching API and V1/V2 end point.

On Tue, 2 Mar 2021 at 15:25, ufuk yılmaz  wrote:
>
> Hello,
>
> I’m trying to change a field’s query analysers. The following works but it 
> replaces both index and query type analysers:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzer": {
> "type": "query",
> "tokenizer": {
>     "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }
> }
> }
>
> I tried to change analyzer field to analyzers, to specify different analysers 
> for query and index, but it gave error:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzers": [{
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> },{
> "type": "index",
> "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }]
> }
> }
>
> "errorMessages":["Plugin init failure for [schema.xml]
> "msg":"error processing commands",...
>
> How can I specify different analyzers for query and index type when using 
> schema api?
>
> Sent from Mail for Windows 10
>



Re: Schema API specifying different analysers for query and index

2021-03-02 Thread Alexandre Rafalovitch
RefGuide gives this for Adding, I would hope the Replace would be similar:

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type":{
 "name":"myNewTextField",
 "class":"solr.TextField",
 "indexAnalyzer":{
"tokenizer":{
   "class":"solr.PathHierarchyTokenizerFactory",
   "delimiter":"/" }},
 "queryAnalyzer":{
"tokenizer":{
   "class":"solr.KeywordTokenizerFactory" }}}
}' http://localhost:8983/solr/gettingstarted/schema

So, indexAnalyzer/queryAnalyzer, rather than array:
https://lucene.apache.org/solr/guide/8_8/schema-api.html#add-a-new-field-type

Hope this works,
Alex.
P.s. Also check whether you are using matching API and V1/V2 end point.

On Tue, 2 Mar 2021 at 15:25, ufuk yılmaz  wrote:
>
> Hello,
>
> I’m trying to change a field’s query analysers. The following works but it 
> replaces both index and query type analysers:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzer": {
> "type": "query",
> "tokenizer": {
>     "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }
> }
> }
>
> I tried to change analyzer field to analyzers, to specify different analysers 
> for query and index, but it gave error:
>
> {
> "replace-field-type": {
> "name": "string_ci",
> "class": "solr.TextField",
> "sortMissingLast": true,
> "omitNorms": true,
> "stored": true,
> "docValues": false,
> "analyzers": [{
> "type": "query",
> "tokenizer": {
> "class": "solr.StandardTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> },{
> "type": "index",
> "tokenizer": {
> "class": "solr.KeywordTokenizerFactory"
> },
> "filters": [
> {
> "class": "solr.LowerCaseFilterFactory"
> }
> ]
> }]
> }
> }
>
> "errorMessages":["Plugin init failure for [schema.xml]
> "msg":"error processing commands",...
>
> How can I specify different analyzers for query and index type when using 
> schema api?
>
> Sent from Mail for Windows 10
>


Schema API specifying different analysers for query and index

2021-03-02 Thread ufuk yılmaz
Hello,

I’m trying to change a field’s query analysers. The following works but it 
replaces both index and query type analysers:

{
"replace-field-type": {
"name": "string_ci",
"class": "solr.TextField",
"sortMissingLast": true,
"omitNorms": true,
"stored": true,
"docValues": false,
"analyzer": {
"type": "query",
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}
}
}

I tried to change analyzer field to analyzers, to specify different analysers 
for query and index, but it gave error:

{
"replace-field-type": {
"name": "string_ci",
"class": "solr.TextField",
"sortMissingLast": true,
"omitNorms": true,
"stored": true,
"docValues": false,
"analyzers": [{
"type": "query",
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
},{
"type": "index",
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
}
]
}]
}
}

"errorMessages":["Plugin init failure for [schema.xml]
"msg":"error processing commands",...

How can I specify different analyzers for query and index type when using 
schema api?

Sent from Mail for Windows 10



Multivalued text_general field returns lowercased value in "if" function query

2021-02-23 Thread ufuk yılmaz
I have a type=”text_general” multivalued=”true” field, named fieldA.

When I use a function query, with fields like

fields=if(true, fieldA, -1), fieldA

Response is:

"response":{"numFound":1,"start":0,"maxScore":4.6553917,"docs":[
  {
"fieldA":["SomeMixedCaseValue"],
"if(true,fieldA,-1)":"somemixedcasevalue"}]
}}

Is this a bug or an expected output? Is there a way to avoid it getting 
lowercased?

Whole field definition is:

  

  
  
  


  
  
  
  

  





-ufuk yilmaz

Sent from Mail for Windows 10



Query regarding integrating solr query functions into blockfacetjoin Query

2021-02-22 Thread Ravi Kumar
Hi Team,

I was implementing block join faceting query in my project and was stuck in
integrating the existing functional queries in the block join faceting
query.

*The current query using 'select' handler is as follows* :-
https://localhost:8983/solr/master_Product_default/*select*?*yq*
=_query_:%22\{\!multiMaxScore\+tie%3D0.0\}\(\(bomCode_bc_string\:samsung\)\+OR\+\(description_text_en\:samsung\)\+OR\+\(belleaprice_cad_bc846_string\:samsung\^20.0\)\+OR\+\(name_text_en\:samsung\^50.0\)\+OR\+\(category_string_mv\:samsung\^20.0\)\)\+OR\+\(\(belleaprice_cad_bc846_string\:samsung\~\^10.0\)\)\+OR\+\(\(bomCode_bc_string\:\%22samsung\%22\^50.0\)\+OR\+\(code_string\:\%22samsung\%22\~1.0\^90.0\)\+OR\+\(vendorId_string\:\%22samsung\%22\^95.0\)\+OR\+\(description_text_en\:\%22samsung\%22\^50.0\)\+OR\+\(belleaprice_cad_bc846_string\:\%22samsung\%22\^40.0\)\+OR\+\(name_text_en\:\%22samsung\%22\^100.0\)\+OR\+\(category_string_mv\:\%22samsung\%22\^40.0\)\+OR\+\(upcCode_bc846_string\:\%22samsung\%22\^99.0\)\)%22&
*yab*
=sum(product(and(not(exists(omniOnlineStockStatus_boolean)),exists(inStoreStockStatus_bc846_bellea_boolean)),70.0),product(and(exists(omniOnlineStockStatus_boolean),exists(inStoreStockStatus_bc846_bellea_boolean)),80.0),product(and(exists(omniOnlineStockStatus_boolean),not(exists(inStoreStockStatus_bc846_bellea_boolean))),40.0),product(exists(omniInStoreStockStatus_bc_boolean),20.0))&*q={!boost}(+{!lucene
v=$yq} {!func v=$yab})*
=(omniAssortment_bc846_boolean:true+OR+omniAssortment_a002_boolean:true)=(srpPriceValue_bc846_string:[0.0+TO+*])=(omniVisible_20_bellea_bc_boolean:true)=(catalogId:%22belleaProductCatalog%22+AND+catalogVersion:%22Online%22)=score+desc,omniInStoreStockStatus_bc_boolean+asc,creationtime_sortable_date+desc,inStoreStockStatus_bc846_bellea_boolean+asc,omniOnlineStockStatus_boolean+asc=0=2=characteristics_string=inStoreStockStatus_bc846_bellea_boolean=memorySize_string_mv=color_en_string=belleaprice_cad_bc846_string=supplier_string=model_string_mv=omniOnlineStockStatus_boolean=category_string_mv=omniInStoreStockStatus_bc_boolean=stockAvailability_string=true=count=1=11=score,*=[child+parentFilter%3D%22itemtype_string:Product%22+childFilter%3D%22brands_stringignorecase_mv:BC+AND+regions_stringignorecase_mv:ON+AND+activationTypes_stringignorecase_mv:N+AND+channels_stringignorecase_mv:NR+AND+banners_stringignorecase_mv:\%22Walmart\%22+AND+(accountTypes_stringignorecase_mv:IR+OR+accountTypes_stringignorecase_mv:empty)%22+limit%3D1000]=true=samsung=en=true

In the above query, the *'yq'* and* 'yab'* functions are integrated in the
main query using expression below :-
  *q={!boost}(+{!lucene v=$yq} {!func v=$yab})  *

I want to integrate the *'yq' and 'yab'* function queries in the *future
block join faceting query* mentioned below :-

https://localhost:8983/solr/master_Product_default/*blockJoinFacetRH*?
*q={!parent%20which=%22itemtype_string:Product%22}itemtype_string:TierPrice=json=true=true=contract_string=500*
=(omniAssortment_bc846_boolean:true+OR+omniAssortment_a002_boolean:true)=(srpPriceValue_bc846_string:[0.0+TO+*])=(omniVisible_20_bellea_bc_boolean:true)=(catalogId:%22belleaProductCatalog%22+AND+catalogVersion:%22Online%22)=score+desc,omniInStoreStockStatus_bc_boolean+asc,creationtime_sortable_date+desc,inStoreStockStatus_bc846_bellea_boolean+asc,omniOnlineStockStatus_boolean+asc=0=2000=characteristics_string=inStoreStockStatus_bc846_bellea_boolean=memorySize_string_mv=color_en_string=belleaprice_cad_bc846_string=supplier_string=model_string_mv=omniOnlineStockStatus_boolean=category_string_mv=omniInStoreStockStatus_bc_boolean=stockAvailability_string=true=count=1=11=score,*=[child+parentFilter%3D%22itemtype_string:Product%22+childFilter%3D%22brands_stringignorecase_mv:BC+AND+regions_stringignorecase_mv:ON+AND+activationTypes_stringignorecase_mv:N+AND+channels_stringignorecase_mv:NR+AND+banners_stringignorecase_mv:\%22Walmart\%22+AND+(accountTypes_stringignorecase_mv:IR+OR+accountTypes_stringignorecase_mv:empty)%22+limit%3D1000]=true=samsung=en=true

Can someone please suggest how can I add the expression '* {!boost}(+{!lucene
v=$yq} {!func v=$yab})*' functions in the block join facting query
-"*q={!parent%20which=%22itemtype_string:Product%22}
itemtype_string:TierPrice=json=true=true=contract_string=500*"
?

I shall be highly grateful if someone can suggest to me some insight.

Thanks & Regards,

Ravi Kumar
SAP Hybris Consultant


Re: Solr 8.0 query length limit

2021-02-18 Thread Anuj Bhargava
Thanks Alex and Shawn.

Regards,

Anuj

On Thu, 18 Feb 2021 at 18:57, Shawn Heisey  wrote:

> On 2/18/2021 3:38 AM, Anuj Bhargava wrote:
> > Solr 8.0 query length limit
> >
> > We are having an issue where queries are too big, we get no result. And
> if
> > we remove a few keywords we get the result.
>
> The best option is to convert the request to POST, as Thomas suggested.
>   With that, the query parameters could be up to 2 megabytes in size
> with no config changes.
>
> The limit for this is enforced by Jetty -- the servlet container that
> Solr ships with.  If you cannot switch your requests to POST, then you
> can find the following line in server/etc/jetty.xml, adjust it, and
> restart Solr:
>
>   name="solr.jetty.request.header.size" default="8192" />
>
> A header limit of 8KB is found in nearly all web servers and related
> software, like load balancers.
>
> Thanks,
> Shawn
>


Re: Solr 8.0 query length limit

2021-02-18 Thread Shawn Heisey

On 2/18/2021 3:38 AM, Anuj Bhargava wrote:

Solr 8.0 query length limit

We are having an issue where queries are too big, we get no result. And if
we remove a few keywords we get the result.


The best option is to convert the request to POST, as Thomas suggested. 
 With that, the query parameters could be up to 2 megabytes in size 
with no config changes.


The limit for this is enforced by Jetty -- the servlet container that 
Solr ships with.  If you cannot switch your requests to POST, then you 
can find the following line in server/etc/jetty.xml, adjust it, and 
restart Solr:


name="solr.jetty.request.header.size" default="8192" />


A header limit of 8KB is found in nearly all web servers and related 
software, like load balancers.


Thanks,
Shawn


Re: Solr 8.0 query length limit

2021-02-18 Thread Alexandre Rafalovitch
Also, investigate if you have repeating conditions and push those into
defaults in custom request handler endpoints (in solrconfig.xml).

Also, Solr supports parameter substitutions, if you have repeated
subconditions.

Regards,
 Alex

On Thu., Feb. 18, 2021, 7:08 a.m. Thomas Corthals, 
wrote:

> You can send big queries as a POST request instead of a GET request.
>
> Op do 18 feb. 2021 om 11:38 schreef Anuj Bhargava :
>
> > Solr 8.0 query length limit
> >
> > We are having an issue where queries are too big, we get no result. And
> if
> > we remove a few keywords we get the result.
> >
> > Error we get - error 414 (Request-URI Too Long)
> >
> >
> > Have made the following changes in jetty.xml, still the same error
> >
> > * > name="solr.jetty.output.buffer.size" default="32768" />*
> > * > name="solr.jetty.output.aggregation.size" default="32768" />*
> > * > name="solr.jetty.request.header.size" default="65536" />*
> > * > name="solr.jetty.response.header.size" default="32768" />*
> > * > name="solr.jetty.send.server.version" default="false" />*
> > * > name="solr.jetty.send.date.header" default="false" />*
> > * > name="solr.jetty.header.cache.size" default="1024" />*
> > * > name="solr.jetty.delayDispatchUntilContent" default="false"/>*
> >
>


Re: Solr 8.0 query length limit

2021-02-18 Thread Thomas Corthals
You can send big queries as a POST request instead of a GET request.

Op do 18 feb. 2021 om 11:38 schreef Anuj Bhargava :

> Solr 8.0 query length limit
>
> We are having an issue where queries are too big, we get no result. And if
> we remove a few keywords we get the result.
>
> Error we get - error 414 (Request-URI Too Long)
>
>
> Have made the following changes in jetty.xml, still the same error
>
> * name="solr.jetty.output.buffer.size" default="32768" />*
> * name="solr.jetty.output.aggregation.size" default="32768" />*
> * name="solr.jetty.request.header.size" default="65536" />*
> * name="solr.jetty.response.header.size" default="32768" />*
> * name="solr.jetty.send.server.version" default="false" />*
> * name="solr.jetty.send.date.header" default="false" />*
> * name="solr.jetty.header.cache.size" default="1024" />*
> * name="solr.jetty.delayDispatchUntilContent" default="false"/>*
>


Solr 8.0 query length limit

2021-02-18 Thread Anuj Bhargava
Solr 8.0 query length limit

We are having an issue where queries are too big, we get no result. And if
we remove a few keywords we get the result.

Error we get - error 414 (Request-URI Too Long)


Have made the following changes in jetty.xml, still the same error

**
**
**
**
**
**
**
**


RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-02-01 Thread Flowerday, Matthew J
Hi There 

Just as an update to this thread I have resolved the issue. The new
schema.xml had this entries







Once I commented out the lines containing _root_ and _nest_path_ (as we
don't have nested documents) and re-started solr then no further duplication
on update occurred.

Regards

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com 
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.
   

-Original Message-
From: Flowerday, Matthew J  
Sent: 15 January 2021 11:18
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

EXTERNAL EMAIL - Be cautious of all links and attachments.


smime.p7s
Description: S/MIME cryptographic signature


Re: Query is timing out.

2021-01-28 Thread Modassar Ather
Hi,

The above boolean query works fine when the rows fetched are smaller like
10/20 but when it is increased to a bigger number it slows down.
Is document collection very expensive? Is there any configuration I am
missing?

*Solr setup details:*
Mode : SolrCloud
Number of Shards : 12
Index size : 3TB approximately.

Thanks,
Modassar

On Wed, Jan 27, 2021 at 7:15 PM Modassar Ather 
wrote:

> Hi,
>
> The boolean query with a bigger value for *rows *times out with the
> following message.
>
> The request took too long to iterate over terms. Timeout: timeoutAt
>
> Solr version : Solr 8.6.3
> Time allowed : 30
> Field  :  />
> Query : fl:(term1 OR term2 OR . OR term1)
> rows : 1
> wt : json/phps
>
> Recently we have migrated from Solr 6.5.1. The above query used to work
> very fast in this Solr version.
>
> Kindly provide your suggestions.
>
> Best,
> Modassar
>


Query is timing out.

2021-01-27 Thread Modassar Ather
Hi,

The boolean query with a bigger value for *rows *times out with the
following message.

The request took too long to iterate over terms. Timeout: timeoutAt

Solr version : Solr 8.6.3
Time allowed : 30
Field  : 
Query : fl:(term1 OR term2 OR . OR term1)
rows : 1
wt : json/phps

Recently we have migrated from Solr 6.5.1. The above query used to work
very fast in this Solr version.

Kindly provide your suggestions.

Best,
Modassar


RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-15 Thread Flowerday, Matthew J
Hi Jim

 

Thanks for looking into it for me.

 

I did some more testing and if I created a base solr 7.7.1 database using
the 'out of the box' schema.xml and solrconfig and add this item manually
using the Solr Admin tool documents/XML

 



ABCD-N1

A test



 

And then update it using

 



ABCD-N1

A test updated



 

It correctly updates and deletes the old copy. 

 

I then 'migrated' it to solr 8.7.0 and updated the record in the same manner
(using documents/XML) with this 

 



ABCD-N1

A test updated again



 

It created a new record without deleting the old record

 

{

  "responseHeader":{

"status":0,

"QTime":1,

"params":{

  "q":"*:*",

  "_":"1610703647168"}},

  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[

  {

"id":"ABCD-N1",

"title_t":"A test updated",

"_version_":1688944583266795520},

  {

"id":"ABCD-N1",

"title_t":"A test updated again",

"_version_":1688950299184594944}]

  }}

 

It is almost as if the delete of the record from the segment set up 7.7.1 is
not recognised.

 

When I updated the record again using

 



ABCD-N1

A test updated again and again



 

It updated the newly created record  and deleted the old version.

 

{

  "responseHeader":{

"status":0,

"QTime":1,

"params":{

  "q":"*:*",

  "_":"1610703647168"}},

  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[

  {

"id":"ABCD-N1",

"title_t":"A test updated",

"_version_":1688944583266795520},

  {

"id":"ABCD-N1",

"title_t":"A test updated again and again",

"_version_":1688950897568120832}]

  }}

 

I did further testing by turning on lucene TRACE on my database and first
update generated

 

2021-01-15 09:38:30.138 INFO  (qtp1458091526-18) [   x:uleaf]
o.a.s.u.LoggingInfoStream [BD][qtp1458091526-18]: now apply del packet
(org.apache.solr.update.SolrIndexWriter@15e9adf2
<mailto:org.apache.solr.update.SolrIndexWriter@15e9adf2> ) to 10 segments,
mergeGen 0

2021-01-15 09:38:30.138 INFO  (qtp1458091526-18) [   x:uleaf]
o.a.s.u.LoggingInfoStream [BD][qtp1458091526-18]: applyTermDeletes took 0.44
msec for 10 segments and 1 del terms; 0 new deletions

 

Whilst the second update generated

 

2021-01-15 09:44:21.543 INFO  (qtp1458091526-17) [   x:uleaf]
o.a.s.u.LoggingInfoStream [BD][qtp1458091526-17]: now apply del packet
(org.apache.solr.update.SolrIndexWriter@15e9adf2
<mailto:org.apache.solr.update.SolrIndexWriter@15e9adf2> ) to 11 segments,
mergeGen 0

2021-01-15 09:44:21.544 INFO  (qtp1458091526-17) [   x:uleaf]
o.a.s.u.LoggingInfoStream [BD][qtp1458091526-17]: applyTermDeletes took 0.29
msec for 11 segments and 1 del terms; 1 new deletions

 

 

I think that it does not seem to find the document to delete in the old
segment.

 

Could this be a bug in Solr?

 

Many thanks

 

Matthew

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  <mailto:matthew.flower...@unisys.com>
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

 <http://www.unisys.com/> 

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 <http://www.linkedin.com/company/unisys><http://twitter.com/unisyscorp>
<http://www.youtube.com/theunisyschannel>
<http://www.facebook.com/unisyscorp>  <https://vimeo.com/unisys>
<http://blogs.unisys.com/> 

 

From: Dyer, Jim  
Sent: 13 January 2021 18:21
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

 

EXTERNAL EMAIL - Be cautious of all links and attachments.

I think if you have _root_ in schema.xml you should look elsewhere.  My
memory is merely adding this one line to schema.xml took care of our
problem.

 

From: Flowerday, Matthew J mailto:matthew.flower...@gb.unisys.com> > 
Sent: Tuesday, January 12, 2021 3:23 AM
To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org> 
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

 

Hi Jim

 

Thanks for getting back to me.

 

I checked the schema.xml that we are using and it has the line you
mentioned:

 



 

And this is the only reference (apart from within a comment

RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-13 Thread Dyer, Jim
I think if you have _root_ in schema.xml you should look elsewhere.  My memory 
is merely adding this one line to schema.xml took care of our problem.

From: Flowerday, Matthew J 
Sent: Tuesday, January 12, 2021 3:23 AM
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi Jim

Thanks for getting back to me.

I checked the schema.xml that we are using and it has the line you mentioned:



And this is the only reference (apart from within a comment) for _root_ In the 
schema.xml. Does your schema.xml have further references to _root_ that I could 
need? I also checked out solrconfig.xml file for any references to _root_ and 
there are none.

Many Thanks

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| 
matthew.flower...@unisys.com<mailto:matthew.flower...@unisys.com>
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX

[unisys_logo]<http://www.unisys.com/>

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]<http://www.linkedin.com/company/unisys>  [Grey_TW] 
<http://twitter.com/unisyscorp>  [Grey_YT] 
<http://www.youtube.com/theunisyschannel> [Grey_FB] 
<http://www.facebook.com/unisyscorp> [Grey_Vimeo] <https://vimeo.com/unisys> 
[Grey_UB] <http://blogs.unisys.com/>

From: Dyer, Jim 
mailto:james.d...@ingramcontent.com>>
Sent: 11 January 2021 22:58
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

EXTERNAL EMAIL - Be cautious of all links and attachments.
When we upgraded from 7.x to 8.x, I ran into an issue similar to yours:  when 
updating an existing document in the index, the document would be duplicated 
instead of replaced as expected.  The solution was to add a "_root_" field to 
schema.xml like this:



It appeared that when a feature was added for nested documents, this field 
somehow became mandatory in order for updates to work properly, at least in 
some cases.

From: Flowerday, Matthew J 
mailto:matthew.flower...@gb.unisys.com>>
Sent: Saturday, January 9, 2021 4:44 AM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

As a test I stopped Solr and ran the IndexUpgrader tool on the database to see 
if this might fix the issue. It completed OK but unfortunately the issue still 
occurs - a new version of the record on solr is created rather than updating 
the original record.

It looks to me as if the record created under 7.7.1 is somehow not being 
'marked as deleted' in the way that records created under 8.7.0 are. Is there a 
way for these records to be marked as deleted when they are updated.

Many Thanks

Matthew


Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| 
matthew.flower...@unisys.com<mailto:matthew.flower...@unisys.com>
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX

[unisys_logo]<http://www.unisys.com/>

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]<http://www.linkedin.com/company/unisys>  [Grey_TW] 
<http://twitter.com/unisyscorp>  [Grey_YT] 
<http://www.youtube.com/theunisyschannel> [Grey_FB] 
<http://www.facebook.com/unisyscorp> [Grey_Vimeo] <https://vimeo.com/unisys> 
[Grey_UB] <http://blogs.unisys.com/>

From: Flowerday, Matthew J 
mailto:matthew.flower...@gb.unisys.com>>
Sent: 07 January 2021 12:25
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped the 
database and re-indexed (as this would take too long to run on site).

On my local windows machine I have a single solr server 7.7.1 installation

I upgraded in the following manner


  *   Installed windows solr 8.7.0 on my machine in a different folder
  *   Copied the core related folder (holding conf, data, lib, core.properties) 
from 7.7.1 to the new 8.7.0 folder
  *   Brought up the solr
  *   Checked that queries work through the Solr Admin Tool and our application

This all worked fine until I tried to update a record which had been created 
under 7.7.1. Instead of marking the old record as deleted it effectively 
created a new copy of the record with the change in and left the old image as 
still visible. When I updated the record again it then correctly updated the 
new

RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-12 Thread Flowerday, Matthew J
Hi Jim

 

Thanks for getting back to me.

 

I checked the schema.xml that we are using and it has the line you
mentioned:

 



 

And this is the only reference (apart from within a comment) for _root_ In
the schema.xml. Does your schema.xml have further references to _root_ that
I could need? I also checked out solrconfig.xml file for any references to
_root_ and there are none.

 

Many Thanks

 

Matthew

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  <mailto:matthew.flower...@unisys.com>
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

 <http://www.unisys.com/> 

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 <http://www.linkedin.com/company/unisys><http://twitter.com/unisyscorp>
<http://www.youtube.com/theunisyschannel>
<http://www.facebook.com/unisyscorp>  <https://vimeo.com/unisys>
<http://blogs.unisys.com/> 

 

From: Dyer, Jim  
Sent: 11 January 2021 22:58
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

 

EXTERNAL EMAIL - Be cautious of all links and attachments.

When we upgraded from 7.x to 8.x, I ran into an issue similar to yours:
when updating an existing document in the index, the document would be
duplicated instead of replaced as expected.  The solution was to add a
"_root_" field to schema.xml like this:

 



 

It appeared that when a feature was added for nested documents, this field
somehow became mandatory in order for updates to work properly, at least in
some cases.

 

From: Flowerday, Matthew J mailto:matthew.flower...@gb.unisys.com> > 
Sent: Saturday, January 9, 2021 4:44 AM
To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org> 
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

 

Hi There

 

As a test I stopped Solr and ran the IndexUpgrader tool on the database to
see if this might fix the issue. It completed OK but unfortunately the issue
still occurs - a new version of the record on solr is created rather than
updating the original record.

 

It looks to me as if the record created under 7.7.1 is somehow not being
'marked as deleted' in the way that records created under 8.7.0 are. Is
there a way for these records to be marked as deleted when they are updated.

 

Many Thanks

 

Matthew

 

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  <mailto:matthew.flower...@unisys.com>
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

 <http://www.unisys.com/> 

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 <http://www.linkedin.com/company/unisys><http://twitter.com/unisyscorp>
<http://www.youtube.com/theunisyschannel>
<http://www.facebook.com/unisyscorp>  <https://vimeo.com/unisys>
<http://blogs.unisys.com/> 

 

From: Flowerday, Matthew J mailto:matthew.flower...@gb.unisys.com> > 
Sent: 07 January 2021 12:25
To: solr-user@lucene.apache.org <mailto:solr-user@lucene.apache.org> 
Subject: Query over migrating a solr database from 7.7.1 to 8.7.0

 

Hi There

 

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped
the database and re-indexed (as this would take too long to run on site).

 

On my local windows machine I have a single solr server 7.7.1 installation

 

I upgraded in the following manner

 

*   Installed windows solr 8.7.0 on my machine in a different folder
*   Copied the core related folder (holding conf, data, lib,
core.properties) from 7.7.1 to the new 8.7.0 folder
*   Brought up the solr
*   Checked that queries work through the Solr Admin Tool and our
application

 

This all worked fine until I tried to update a record which had been created
under 7.7.1. Instead of marking the old record as deleted it effectively
created a new copy of the record with the change in and left the old image
as still visible. When I updated the record again it then correctly updated
the new 8.7.0 version without leaving the old image behind. If I created a
new record and then updated it the solr record would be updated correctly.
The issue only seemed to affect the old 7.7.1 created records.

 

An example of the duplication as follows (the first record is 7.7.1 created
version and the second record is the 8.7.0 version after carrying out an
update):

 

{

  "responseHeader":{

"status":0,

"

Combining edismax Parser with Block Join Parent Query Parser

2021-01-11 Thread Ravi Lodhi
Hello Guys,

Does Solr support edismax parser with Block Join Parent Query Parser? If
yes then could you provide me the syntax or point me to some reference
document? And how does it affect the performance?

I am working on a search screen in an eCommerce application's backend. The
requirement is to design an order search screen. We were thinking of using
a nested document approach. Order information document as parent and all
its items as child document. We need to perform keyword search on both
parent and child documents. By using Block Join Parent Query Parser we can
search only on child documents and can retrieve parents. The sample
document structure is given below. We need "OR" condition between edismax
query and Block Join Parent Query Parser.

Is the nested document a good approach for the order and order items
related data or we should denormalize the data either at parent level or
child level? Which will be the best suitable schema design in this scenario?

e.g. If I search "WEB" then if this is found in any of the child documents
then the parent doc should return or if it is found on any parent document
then that parent should return.

Sample Parent doc:
{
"orderId": "ORD1",
"orderTypeId": "SALES",
"orderStatusId": "ORDER_APPROVED",
"orderStatusDescription": "Approved",
"orderDate": "2021-01-09T07:00:00Z",
"orderGrandTotal": "200",
"salesChannel": "WEB",
"salesRepNames": "Demo Supplier",
"originFacilityId": "FACILITY_01"
}

Sample Child doc:

{
"orderItemId": "ORD1",
"itemStatusId": "ORDER_APPROVED",
"itemStatusDescription": "Approved",
"productId": "P01",
"productName": "Demo Product",
"productInternalName": "Demo Product 01",
"productBrandName": "Demo Brand"
}

Any Help on this will be much appreciated!

Thanks!
Ravi Lodhi


RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-11 Thread Dyer, Jim
When we upgraded from 7.x to 8.x, I ran into an issue similar to yours:  when 
updating an existing document in the index, the document would be duplicated 
instead of replaced as expected.  The solution was to add a "_root_" field to 
schema.xml like this:



It appeared that when a feature was added for nested documents, this field 
somehow became mandatory in order for updates to work properly, at least in 
some cases.

From: Flowerday, Matthew J 
Sent: Saturday, January 9, 2021 4:44 AM
To: solr-user@lucene.apache.org
Subject: RE: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

As a test I stopped Solr and ran the IndexUpgrader tool on the database to see 
if this might fix the issue. It completed OK but unfortunately the issue still 
occurs - a new version of the record on solr is created rather than updating 
the original record.

It looks to me as if the record created under 7.7.1 is somehow not being 
'marked as deleted' in the way that records created under 8.7.0 are. Is there a 
way for these records to be marked as deleted when they are updated.

Many Thanks

Matthew


Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| 
matthew.flower...@unisys.com<mailto:matthew.flower...@unisys.com>
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX

[unisys_logo]<http://www.unisys.com/>

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]<http://www.linkedin.com/company/unisys>  [Grey_TW] 
<http://twitter.com/unisyscorp>  [Grey_YT] 
<http://www.youtube.com/theunisyschannel> [Grey_FB] 
<http://www.facebook.com/unisyscorp> [Grey_Vimeo] <https://vimeo.com/unisys> 
[Grey_UB] <http://blogs.unisys.com/>

From: Flowerday, Matthew J 
mailto:matthew.flower...@gb.unisys.com>>
Sent: 07 January 2021 12:25
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Subject: Query over migrating a solr database from 7.7.1 to 8.7.0

Hi There

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped the 
database and re-indexed (as this would take too long to run on site).

On my local windows machine I have a single solr server 7.7.1 installation

I upgraded in the following manner


  *   Installed windows solr 8.7.0 on my machine in a different folder
  *   Copied the core related folder (holding conf, data, lib, core.properties) 
from 7.7.1 to the new 8.7.0 folder
  *   Brought up the solr
  *   Checked that queries work through the Solr Admin Tool and our application

This all worked fine until I tried to update a record which had been created 
under 7.7.1. Instead of marking the old record as deleted it effectively 
created a new copy of the record with the change in and left the old image as 
still visible. When I updated the record again it then correctly updated the 
new 8.7.0 version without leaving the old image behind. If I created a new 
record and then updated it the solr record would be updated correctly. The 
issue only seemed to affect the old 7.7.1 created records.

An example of the duplication as follows (the first record is 7.7.1 created 
version and the second record is the 8.7.0 version after carrying out an 
update):

{
  "responseHeader":{
"status":0,
"QTime":4,
"params":{
  "q":"id:9901020319M01-N26",
  "_":"1610016003669"}},
  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
  {
"id":"9901020319M01-N26",
"groupId":"9901020319M01",
"urn":"N26",
"specification":"nominal",
"owningGroupId":"9901020319M01",
"description":"N26, Yates, Mike, Alan, Richard, MALE",
"group_t":"9901020319M01",
"nominalUrn_t":"N26",
"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",
"dateTimeCreated_dt":"2020-12-30T12:00:53Z",
"title_t":"Captain",
"surname_t":"Yates",
"qualifier_t":"Voyager",
"forename1_t":"Mike",
"forename2_t":"Alan",
"forename3_t":"Richard",
"sex_t":"MALE",
"orderedType_t":"Nominal",
"_version_":1687507566832123904},
  {
"id":"9901020319M01-N26",
"groupId":"9901020319M01",
"urn":"N26",
   

RE: [solr8.7] not relevant results for chinese query

2021-01-11 Thread Bruno Mannina
Hi,

With this article ( 
https://opensourceconnections.com/blog/2011/12/23/indexing-chinese-in-solr/ ), 
I begin to understand what happens.

Is someone have already try, with a recent SOLR, the Poading algorithm?


Thanks,
Bruno

-Message d'origine-
De : Bruno Mannina [mailto:bmann...@free.fr]
Envoyé : dimanche 10 janvier 2021 17:57
À : solr-user@lucene.apache.org
Objet : [solr8.7] not relevant results for chinese query

Hello,



I try to use chinese language with my index.



My definition is:











  

   

   

   

   

   

  





But, I get too much not relevant results.



i.e. : With the query (phone case):

tizh:(手機殼)



my query is translate to:

tizh:(手 OR 機 OR 殼)



But:

tizh:(手 AND 機 AND 殼)

returns 0 result.



And:

tizh:”手機殼”

returns also 0 result.



Is it possible to improve my fieldType ? or must I add something else ?



Thanks,

Bruno





--
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



[solr8.7] not relevant results for chinese query

2021-01-10 Thread Bruno Mannina
Hello,

 

I try to use chinese language with my index.

 

My definition is:



 







  

   

   

   

   

   

  



 

But, I get too much not relevant results.

 

i.e. : With the query (phone case):

tizh:(手機殼)

 

my query is translate to:

tizh:(手 OR 機 OR 殼)

 

But:

tizh:(手 AND 機 AND 殼)

returns 0 result.

 

And:

tizh:”手機殼”

returns also 0 result.

 

Is it possible to improve my fieldType ? or must I add something else ?

 

Thanks,

Bruno

 



-- 
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Re: Solr query with space (only) gives error

2021-01-09 Thread vstuart
Cross-posted / addressed (both me), here.

https://stackoverflow.com/questions/65620642/solr-query-with-space-only-q-20-stalls/65638561#65638561



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr query with space (only) gives error

2021-01-09 Thread vstuart
Cross-posted / addressed (both me), here.

https://stackoverflow.com/questions/65620642/solr-query-with-space-only-q-20-stalls/65638561#65638561



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-09 Thread matthew sporleder
Did you commit?

> On Jan 9, 2021, at 5:44 AM, Flowerday, Matthew J 
>  wrote:
> 
> 
> Hi There
>  
> As a test I stopped Solr and ran the IndexUpgrader tool on the database to 
> see if this might fix the issue. It completed OK but unfortunately the issue 
> still occurs – a new version of the record on solr is created rather than 
> updating the original record.
>  
> It looks to me as if the record created under 7.7.1 is somehow not being 
> ‘marked as deleted’ in the way that records created under 8.7.0 are. Is there 
> a way for these records to be marked as deleted when they are updated.
>  
> Many Thanks
>  
> Matthew
>  
>  
> Matthew Flowerday | Consultant | ULEAF
> Unisys | 01908 774830| matthew.flower...@unisys.com
> Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX
>  
> 
>  
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
> MATERIAL and is for use only by the intended recipient. If you received this 
> in error, please contact the sender and delete the e-mail and its attachments 
> from all devices.
> 
>  
> 
>  
> 
> 
> 
> 
>  
> From: Flowerday, Matthew J  
> Sent: 07 January 2021 12:25
> To: solr-user@lucene.apache.org
> Subject: Query over migrating a solr database from 7.7.1 to 8.7.0
>  
> Hi There
>  
> I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped 
> the database and re-indexed (as this would take too long to run on site).
>  
> On my local windows machine I have a single solr server 7.7.1 installation
>  
> I upgraded in the following manner
>  
> Installed windows solr 8.7.0 on my machine in a different folder
> Copied the core related folder (holding conf, data, lib, core.properties) 
> from 7.7.1 to the new 8.7.0 folder
> Brought up the solr
> Checked that queries work through the Solr Admin Tool and our application
>  
> This all worked fine until I tried to update a record which had been created 
> under 7.7.1. Instead of marking the old record as deleted it effectively 
> created a new copy of the record with the change in and left the old image as 
> still visible. When I updated the record again it then correctly updated the 
> new 8.7.0 version without leaving the old image behind. If I created a new 
> record and then updated it the solr record would be updated correctly. The 
> issue only seemed to affect the old 7.7.1 created records.
>  
> An example of the duplication as follows (the first record is 7.7.1 created 
> version and the second record is the 8.7.0 version after carrying out an 
> update):
>  
> {
>   "responseHeader":{
> "status":0,
> "QTime":4,
> "params":{
>   "q":"id:9901020319M01-N26",
>   "_":"1610016003669"}},
>   "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
>   {
> "id":"9901020319M01-N26",
> "groupId":"9901020319M01",
> "urn":"N26",
> "specification":"nominal",
> "owningGroupId":"9901020319M01",
> "description":"N26, Yates, Mike, Alan, Richard, MALE",
> "group_t":"9901020319M01",
> "nominalUrn_t":"N26",
> "dateTimeCreated_dtr":"2020-12-30T12:00:53Z",
> "dateTimeCreated_dt":"2020-12-30T12:00:53Z",
> "title_t":"Captain",
> "surname_t":"Yates",
> "qualifier_t":"Voyager",
> "forename1_t":"Mike",
> "forename2_t":"Alan",
> "forename3_t":"Richard",
> "sex_t":"MALE",
> "orderedType_t":"Nominal",
> "_version_":1687507566832123904},
>   {
> "id":"9901020319M01-N26",
> "groupId":"9901020319M01",
> "urn":"N26",
> "specification":"nominal",
> "owningGroupId":"9901020319M01",
> "description":"N26, Yates, Mike, Alan, Richard, MALE",
> "group_t":"9901020319M01",
> "nominalUrn_t":"N26",
> "dateTimeCreated_dtr":"2020-12-30T12:00:53Z",
> "dateTimeCreated_dt":"2020-12-30T12:00:53Z",
> &qu

RE: Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-09 Thread Flowerday, Matthew J
Hi There

 

As a test I stopped Solr and ran the IndexUpgrader tool on the database to
see if this might fix the issue. It completed OK but unfortunately the issue
still occurs - a new version of the record on solr is created rather than
updating the original record.

 

It looks to me as if the record created under 7.7.1 is somehow not being
'marked as deleted' in the way that records created under 8.7.0 are. Is
there a way for these records to be marked as deleted when they are updated.

 

Many Thanks

 

Matthew

 

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  <mailto:matthew.flower...@unisys.com>
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

 <http://www.unisys.com/> 

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 <http://www.linkedin.com/company/unisys><http://twitter.com/unisyscorp>
<http://www.youtube.com/theunisyschannel>
<http://www.facebook.com/unisyscorp>  <https://vimeo.com/unisys>
<http://blogs.unisys.com/> 

 

From: Flowerday, Matthew J  
Sent: 07 January 2021 12:25
To: solr-user@lucene.apache.org
Subject: Query over migrating a solr database from 7.7.1 to 8.7.0

 

Hi There

 

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped
the database and re-indexed (as this would take too long to run on site).

 

On my local windows machine I have a single solr server 7.7.1 installation

 

I upgraded in the following manner

 

*   Installed windows solr 8.7.0 on my machine in a different folder
*   Copied the core related folder (holding conf, data, lib,
core.properties) from 7.7.1 to the new 8.7.0 folder
*   Brought up the solr
*   Checked that queries work through the Solr Admin Tool and our
application

 

This all worked fine until I tried to update a record which had been created
under 7.7.1. Instead of marking the old record as deleted it effectively
created a new copy of the record with the change in and left the old image
as still visible. When I updated the record again it then correctly updated
the new 8.7.0 version without leaving the old image behind. If I created a
new record and then updated it the solr record would be updated correctly.
The issue only seemed to affect the old 7.7.1 created records.

 

An example of the duplication as follows (the first record is 7.7.1 created
version and the second record is the 8.7.0 version after carrying out an
update):

 

{

  "responseHeader":{

"status":0,

"QTime":4,

"params":{

  "q":"id:9901020319M01-N26",

  "_":"1610016003669"}},

  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[

  {

"id":"9901020319M01-N26",

"groupId":"9901020319M01",

"urn":"N26",

"specification":"nominal",

"owningGroupId":"9901020319M01",

"description":"N26, Yates, Mike, Alan, Richard, MALE",

"group_t":"9901020319M01",

"nominalUrn_t":"N26",

"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",

"dateTimeCreated_dt":"2020-12-30T12:00:53Z",

"title_t":"Captain",

"surname_t":"Yates",

"qualifier_t":"Voyager",

"forename1_t":"Mike",

"forename2_t":"Alan",

"forename3_t":"Richard",

"sex_t":"MALE",

"orderedType_t":"Nominal",

"_version_":1687507566832123904},

  {

"id":"9901020319M01-N26",

"groupId":"9901020319M01",

"urn":"N26",

"specification":"nominal",

"owningGroupId":"9901020319M01",

"description":"N26, Yates, Mike, Alan, Richard, MALE",

"group_t":"9901020319M01",

"nominalUrn_t":"N26",

"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",

"dateTimeCreated_dt":"2020-12-30T12:00:53Z",

"title_t":"Captain",

"surname_t":"Yates",

"qualifier_t":"Voyager enterprise defiant yorktown xx yy",

"forename1_t":"Mike",


Solr query with space (only) gives error

2021-01-07 Thread vstuart
I have a frontend that uses Ajax to query Solr.

It's working well, but if I enter a single space (nothing else) in the
input/search box (the URL in the browser will show

... index.html#q=%20

In that circumstance I get a 400 error (as there are no parameters in the
request), which is fine, but my web page stalls, waiting for a response.

If, however, I enter a semicolon ( ; ) rather than a space, then the page
immediately refreshes, albeit with no results ("displaying 0 to 0 of 0").
Also fine / expected.

My question is what is triggering the " " (%20) query fault in Solr, and how
do I address (ideally, ignore) it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr query with space (only) gives error

2021-01-07 Thread vstuart
I have a frontend that uses Ajax to query Solr.

It's working well, but if I enter a single space (nothing else) in the
input/search box (the URL in the browser will show

... index.html#q=%20

In that circumstance I get a 400 error (as there are no parameters in the
request), which is fine, but my web page stalls, waiting for a response.

If, however, I enter a semicolon ( ; ) rather than a space, then the page
immediately refreshes, albeit with no results ("displaying 0 to 0 of 0").
Also fine / expected.

My question is what is triggering the " " (%20) query fault in Solr, and how
do I address (ideally, ignore) it?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Query over migrating a solr database from 7.7.1 to 8.7.0

2021-01-07 Thread Flowerday, Matthew J
Hi There

 

I have recently upgraded a solr database from 7.7.1 to 8.7.0 and not wiped
the database and re-indexed (as this would take too long to run on site).

 

On my local windows machine I have a single solr server 7.7.1 installation

 

I upgraded in the following manner

 

*   Installed windows solr 8.7.0 on my machine in a different folder
*   Copied the core related folder (holding conf, data, lib,
core.properties) from 7.7.1 to the new 8.7.0 folder
*   Brought up the solr
*   Checked that queries work through the Solr Admin Tool and our
application

 

This all worked fine until I tried to update a record which had been created
under 7.7.1. Instead of marking the old record as deleted it effectively
created a new copy of the record with the change in and left the old image
as still visible. When I updated the record again it then correctly updated
the new 8.7.0 version without leaving the old image behind. If I created a
new record and then updated it the solr record would be updated correctly.
The issue only seemed to affect the old 7.7.1 created records.

 

An example of the duplication as follows (the first record is 7.7.1 created
version and the second record is the 8.7.0 version after carrying out an
update):

 

{

  "responseHeader":{

"status":0,

"QTime":4,

"params":{

  "q":"id:9901020319M01-N26",

  "_":"1610016003669"}},

  "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[

  {

"id":"9901020319M01-N26",

"groupId":"9901020319M01",

"urn":"N26",

"specification":"nominal",

"owningGroupId":"9901020319M01",

"description":"N26, Yates, Mike, Alan, Richard, MALE",

"group_t":"9901020319M01",

"nominalUrn_t":"N26",

"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",

"dateTimeCreated_dt":"2020-12-30T12:00:53Z",

"title_t":"Captain",

"surname_t":"Yates",

"qualifier_t":"Voyager",

"forename1_t":"Mike",

"forename2_t":"Alan",

"forename3_t":"Richard",

"sex_t":"MALE",

"orderedType_t":"Nominal",

"_version_":1687507566832123904},

  {

"id":"9901020319M01-N26",

"groupId":"9901020319M01",

"urn":"N26",

"specification":"nominal",

"owningGroupId":"9901020319M01",

"description":"N26, Yates, Mike, Alan, Richard, MALE",

"group_t":"9901020319M01",

"nominalUrn_t":"N26",

"dateTimeCreated_dtr":"2020-12-30T12:00:53Z",

"dateTimeCreated_dt":"2020-12-30T12:00:53Z",

"title_t":"Captain",

"surname_t":"Yates",

"qualifier_t":"Voyager enterprise defiant yorktown xx yy",

"forename1_t":"Mike",

"forename2_t":"Alan",

"forename3_t":"Richard",

"sex_t":"MALE",

"orderedType_t":"Nominal",

"_version_":1688224966566215680}]

  }}

 

I checked the solrconfig.xml file and it does have a uniqueKey set up

 

  

  

id

 

I was wondering if this behaviour is expected and if there is a way to make
sure that records created under a previous version are updated correctly (so
that the old data is deleted when updated).

 

Also am I upgrading solr correctly as it could be that the way I have
upgraded it might be causing this issue (I tried hunting through the solr
documentation online but struggled to find window upgrade notes and the
above steps I worked out by trial and error).

 

Many thanks

 

Matthew

 

Matthew Flowerday | Consultant | ULEAF

Unisys | 01908 774830|  
matthew.flower...@unisys.com 

Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX

 

  

 

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.

 

  
 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Why do I get different results for the same query with two Solr versions?

2021-01-04 Thread nettadalet
Tulsi wrote
> Can you post the managed schema and solrconfig content here ?

Schema for the 4.6 index (I omitted all non-relevant data):






























Schema for the 7.5 index (I omitted all non-relevant data):






























About the solrconfig.xml file - I don't think I can share it because it may
contain sensitive information. Is there something specific from this file
that may be relevant for our discussion?


Tulsi wrote
> Do try the solr admin analysis screen
> once as well to see the behaviour for this field.
> https://lucene.apache.org/solr/guide/7_6/index.html

I looked at the analysis screen, but it wasn't helpful. That's why I started
using the "debug=query" parameter and the content of parsedquery.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why do I get different results for the same query with two Solr versions?

2020-12-29 Thread Tulsi Das
Can you post the managed schema and solrconfig content here ?

Do try the solr admin analysis screen
once as well to see the behaviour for this field.

https://lucene.apache.org/solr/guide/7_6/index.html

On Sun, 27 Dec, 2020, 6:54 pm nettadalet,  wrote:

> Thank you, that was helpful!
>
> For Solr 4.6 I get
> "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
> For Solr 7.5 I get
> "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
> +TITLE_ItemCode_t:7)))"
>
> So this is the cause of the difference in the search result, but I still
> don't know why the parsedquery is different between the two versions.
> Any idea/guess?
> Is it some internal implementation that changed sometime between 4.6 and
> 7.5?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-29 Thread nettadalet
Hi,
thank for the comment, but I tried to use both "sow=false" and "saw=true"
and I still get the same result. For query (TITLE_ItemCode_t:KI_7) I still
see:
Solr 4.6: "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
Solr 7.5: "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"



Tulsi wrote
> Hi ,
> Yes this look like related to sow (split on whitespace) param default
> behaviour change in solr 7.
> 
> The sow parameter (short for "Split on Whitespace") now defaults to
> false, which allows support for multi-word synonyms out of the box.
> This parameter is used with the eDismax and standard/"lucene" query
> parsers. If this parameter is not explicitly specified as true, query
> text will not be split on whitespace before analysis.
> 
> https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
> 
> 
> On Sun, 27 Dec, 2020, 8:25 pm nettadalet, 

> nsteinberg@

>  wrote:
> 
>> I added "defType=lucene" to both searches to make sure I use the same
>> query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re:Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-28 Thread xiefengchang



SOW default to false?
but this seems to be true right??
For Solr 7.5 I get
"parsedquery":"+(+(text1:ki7 (+text1:ki
+text1:7)))"














At 2020-12-28 01:13:29, "Tulsi Das"  wrote:
>Hi ,
>Yes this look like related to sow (split on whitespace) param default
>behaviour change in solr 7.
>
>The sow parameter (short for "Split on Whitespace") now defaults to
>false, which allows support for multi-word synonyms out of the box.
>This parameter is used with the eDismax and standard/"lucene" query
>parsers. If this parameter is not explicitly specified as true, query
>text will not be split on whitespace before analysis.
>
>https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
>
>
>On Sun, 27 Dec, 2020, 8:25 pm nettadalet,  wrote:
>
>> I added "defType=lucene" to both searches to make sure I use the same query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread Tulsi Das
Hi ,
Yes this look like related to sow (split on whitespace) param default
behaviour change in solr 7.

The sow parameter (short for "Split on Whitespace") now defaults to
false, which allows support for multi-word synonyms out of the box.
This parameter is used with the eDismax and standard/"lucene" query
parsers. If this parameter is not explicitly specified as true, query
text will not be split on whitespace before analysis.

https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html


On Sun, 27 Dec, 2020, 8:25 pm nettadalet,  wrote:

> I added "defType=lucene" to both searches to make sure I use the same query
> parser, but it didn't change the results.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet
I added "defType=lucene" to both searches to make sure I use the same query
parser, but it didn't change the results.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet
I'm not sure how to check the implementation of the query parser, or how to
change the query parser that I use. I think I'm using the standard query
parser.

I use Solr Admin to run the queries. If I look at the URL, I see
Solr 4.6:
select?q=TITLE_ItemCode_t:KI_7=TITLE_ItemCode_t
Solr 7.5:
select?q=TITLE_ItemCode_t:KI_7=TITLE_ItemCode_t

Should I change something?
Where should I look?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread xiefengchang
which query parser are you using? I think to answer your question, you need to 
check the implementation of the query parser

















At 2020-12-27 21:23:59, "nettadalet"  wrote:
>Thank you, that was helpful!
>
>For Solr 4.6 I get 
>"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
>For Solr 7.5 I get
>"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
>+TITLE_ItemCode_t:7)))"
>
>So this is the cause of the difference in the search result, but I still
>don't know why the parsedquery is different between the two versions.
>Any idea/guess?
>Is it some internal implementation that changed sometime between 4.6 and
>7.5?
>
>
>
>--
>Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet
Thank you, that was helpful!

For Solr 4.6 I get 
"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"

For Solr 7.5 I get
"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"

So this is the cause of the difference in the search result, but I still
don't know why the parsedquery is different between the two versions.
Any idea/guess?
Is it some internal implementation that changed sometime between 4.6 and
7.5?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread Tulsi Das
Hi,
Try adding debug=true or debug=query in the url and see the formed query at
the end .
You will get to know why the results are different.


On Thu, 24 Dec, 2020, 8:05 pm nettadalet,  wrote:

> Hello,
>
> I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
> search with both versions, I get different results, and I don't know why
>
> I have the following *field type definition in Solr 4.6*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
>
> I have the following *field type definition in Solr 7.5*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> ignoreCase="true"
>words="stopwords.txt"
>/>
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
> * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
> solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
> but the result was the same.
>
> I have the following *6 values set for field text1 of type text_type1 for 6
> different documents* (the type(s) from above):
> KI_d5e7b43a
> KI_b7c490bd
> KI_7df2f026
> KI_fa7d129d
> KI_5867aec7
> KI_7c3c0b93
>
>
> My query is *text1=KI_7*.
> Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
> Using Solr 7.5, I get all 6 results.
>
> Questions:
> 1. How come I get different results with the same data, when my fields
> definitions are the same (as far as I can tell)?
>
> 2. What are the expected results?
> I think that the results Solr 7.5 returns are the correct ones, since at
> the
> end of the of the analysis I get *KA* as a term and *7* as a term, both
> during the indexing analysis and the query analysis, so, to my
> understanding, all 6 results should be found.
> Is this correct? if not, what am I missing? what don't I understand
> correctly?
>
> I would very much appreciate a full/partial answer, but even a link that
> could explain at least the expected results part would be great.
>
> Thanks in advance, I know this might be a tough one to answer [Hope not
> :)]
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread nettadalet
Hello,

I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why

I have the following *field type definition in Solr 4.6*:



















I have the following *field type definition in Solr 7.5*:



















* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.

I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93


My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.

Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?

2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?

I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great. 

Thanks in advance, I know this might be a tough one to answer [Hope not  :)]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr cloud facet query returning incorrect results

2020-12-21 Thread Erick Erickson
This should work as you expect, so the first thing I’d do 
is add =query and see the parsed query in both cases.

If that doesn’t show anything, please post the 
full debug response in both cases.

Best,
Erick

> On Dec 21, 2020, at 4:31 AM, Alok Bhandari  wrote:
> 
> Hello All ,
> 
> we are using Solr6.2 , in schema that we use we have an integer field. For
> a given query we want to know how many documents have duplicate value for
> the field , for an example how many documents have same doc_id=10.
> 
> So to find this information we fire a query to solr-cloud with following
> parameters
> 
> "q":"organization:abc",
>  "facet.limit":"10",
>  "facet.field":"doc_id",
>  "indent":"on",
>  "fl":"archive_id",
>  "facet.mincount":"2",
>"facet":"true",
> 
> 
> But in response we get that there are no documents having duplicate
> doc_id as in facet query response we are not getting any facet_counts
> , but if I change the query to "q":"organization:abc AND doc_id:10"
> then in response I can see that there are 3 docs with doc_id=10.
> 
> This behavior seems contrary to how facets behave , so wanted to know
> if there is any possible reason for this type of behavior.



Solr cloud facet query returning incorrect results

2020-12-21 Thread Alok Bhandari
Hello All ,

we are using Solr6.2 , in schema that we use we have an integer field. For
a given query we want to know how many documents have duplicate value for
the field , for an example how many documents have same doc_id=10.

So to find this information we fire a query to solr-cloud with following
parameters

"q":"organization:abc",
  "facet.limit":"10",
  "facet.field":"doc_id",
  "indent":"on",
  "fl":"archive_id",
  "facet.mincount":"2",
"facet":"true",


But in response we get that there are no documents having duplicate
doc_id as in facet query response we are not getting any facet_counts
, but if I change the query to "q":"organization:abc AND doc_id:10"
then in response I can see that there are 3 docs with doc_id=10.

This behavior seems contrary to how facets behave , so wanted to know
if there is any possible reason for this type of behavior.


Re: Function Query Optimization

2020-12-14 Thread Jae Joo
Should SubQuery be faster than FunctionQuery?

On Sat, Dec 12, 2020 at 10:24 AM Vincenzo D'Amore 
wrote:

> Hi, looking at this sample it seems you have just one document for '12345',
> one for '23456' and so on so forth. If this is true, why don't just try
> with a subquery
>
> https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-_subquery_
>
> On Fri, Dec 11, 2020 at 3:31 PM Jae Joo  wrote:
>
> > I have the requirement to create field  - xyz to be returned based on the
> > matched result.
> > Here Is the code .
> >
> > XYZ:concat(
> >
> > if(exists(query({!v='field1:12345'})), '12345', ''),
> >
> >     if(exists(query({!v='field1:23456'})), '23456', ''),
> >
> > if(exists(query({!v='field1:34567'})), '34567', ''),
> >
> > if(exists(query({!v='field:45678'})), '45678','')
> > ),
> >
> > I am feeling this is very complex, so I am looking for some smart and
> > faster ideas.
> >
> > Thanks,
> >
> > Jae
> >
>
>
> --
> Vincenzo D'Amore
>


Re: Function Query Optimization

2020-12-12 Thread Vincenzo D'Amore
Hi, looking at this sample it seems you have just one document for '12345',
one for '23456' and so on so forth. If this is true, why don't just try
with a subquery
https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html#TransformingResultDocuments-_subquery_

On Fri, Dec 11, 2020 at 3:31 PM Jae Joo  wrote:

> I have the requirement to create field  - xyz to be returned based on the
> matched result.
> Here Is the code .
>
> XYZ:concat(
>
> if(exists(query({!v='field1:12345'})), '12345', ''),
>
> if(exists(query({!v='field1:23456'})), '23456', ''),
>
> if(exists(query({!v='field1:34567'})), '34567', ''),
>
> if(exists(query({!v='field:45678'})), '45678','')
> ),
>
> I am feeling this is very complex, so I am looking for some smart and
> faster ideas.
>
> Thanks,
>
> Jae
>


-- 
Vincenzo D'Amore


Function Query Optimization

2020-12-11 Thread Jae Joo
I have the requirement to create field  - xyz to be returned based on the
matched result.
Here Is the code .

XYZ:concat(

if(exists(query({!v='field1:12345'})), '12345', ''),

if(exists(query({!v='field1:23456'})), '23456', ''),

if(exists(query({!v='field1:34567'})), '34567', ''),

if(exists(query({!v='field:45678'})), '45678','')
),

I am feeling this is very complex, so I am looking for some smart and
faster ideas.

Thanks,

Jae


Re: nested facets of query and terms type in JSON format

2020-12-10 Thread Arturas Mazeika
Hi Jason,

Thanks a lot for the post. Indeed the web page you are referring to has
some very nice examples. Well done.

Cheers,
Arturas

Increasing the number of threads (through the url or params section) is
unsuccessful so far. Maybe solr takes it only as a hint.

On Thu, Dec 10, 2020 at 8:01 PM Jason Gerlowski 
wrote:

> Hey Arturas,
>
> Can't help you with the secrets of Michael's inspiration (though I'm
> also curious :-p).  And I'm not sure if there's any equivalent of
> facet.threads for JSON Faceting.  You're on your own there
> unfortunately.
>
> But you (or other readers) might find this "Query Facet" example handy
> - it uses the "type": "query" syntax that MIchael mentioned. [1]
>
> [1]
> https://lucene.apache.org/solr/guide/8_5/json-facet-api.html#query-facet
>
> Best,
> Jason
>
> On Thu, Dec 3, 2020 at 5:49 PM Arturas Mazeika  wrote:
> >
> > Hi Michael,
> >
> > I wish I were able to do a percent of what you are doing. Where does your
> > inspiration come from? It is not from the manuals, cause I've checked
> > those. How do you come up with this piece of art? Did you check this from
> > the source code? Which lines revealed these secrets? I am eternally
> > grateful for your help!
> >
> > Michael, maybe you happen to know how I can plugin in facet.threads
> > parameter in that JSON body below, so the query uses more threads to
> > compute the answer? I am dying out of curiosity.
> >
> > Cheers,
> > Arturas
> >
> > On Thu, Dec 3, 2020 at 7:59 PM Michael Gibney  >
> > wrote:
> >
> > > I think the first "error" case in your set of examples above is
> closest to
> > > being correct. For "query" facet type, I think you want to explicitly
> > > specify `"type":"query"`, and specify the query itself in the `"q"`
> param,
> > > i.e.:
> > > {
> > > "query"  : "*:*",
> > > "limit"  : 0,
> > >
> > > "facet": {
> > > "aip": {
> > > "type":  "query",
> > > "q":  "cfname2:aip",
> > > "facet": {
> > > "t_buckets": {
> > > "type":  "range",
> > > "field": "t",
> > > "sort": { "t": "asc" },
> > >     "start": "2018-05-02T17:00:00.000Z",
> > > "end":   "2020-11-16T21:00:00.000Z",
> > > "gap":   "+1HOUR"
> > > "limit": 1
> > > }
> > > }
> > > }
> > > }
> > > }
> > >
> > > On Thu, Dec 3, 2020 at 12:59 PM Arturas Mazeika 
> wrote:
> > >
> > > > Hi Michael,
> > > >
> > > > Thanks for helping me to figure this out.
> > > >
> > > > If I fire:
> > > >
> > > > {
> > > > "query"  : "*:*",
> > > > "limit"  : 0,
> > > >
> > > > "facet": {
> > > > "aip": { "query":  "cfname2:aip", }
> > > >
> > > > }
> > > > }
> > > >
> > > > I get
> > > >
> > > > "response": { "numFound": 20560849, "start": 0, "numFoundExact":
> true,
> > > > "docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307
> } } }
> > > >
> > > > (works). If I fire
> > > >
> > > >
> > > > {
> > > > "query"  : "*:*",
> > > > "limit"  : 0,
> > > >
> > > > "facet": {
> > > > "t_buckets": {
> > > > "type":  "range",
> > > > "field": "t",
> > > > "sort": { "t": "asc" },
> > > > "start": "2018-05-02T17:00:00.000Z",
> > > > "end":   "2020-11-16T21:00:00.000Z",
> > > > "gap":   "+1

Re: nested facets of query and terms type in JSON format

2020-12-10 Thread Jason Gerlowski
Hey Arturas,

Can't help you with the secrets of Michael's inspiration (though I'm
also curious :-p).  And I'm not sure if there's any equivalent of
facet.threads for JSON Faceting.  You're on your own there
unfortunately.

But you (or other readers) might find this "Query Facet" example handy
- it uses the "type": "query" syntax that MIchael mentioned. [1]

[1] https://lucene.apache.org/solr/guide/8_5/json-facet-api.html#query-facet

Best,
Jason

On Thu, Dec 3, 2020 at 5:49 PM Arturas Mazeika  wrote:
>
> Hi Michael,
>
> I wish I were able to do a percent of what you are doing. Where does your
> inspiration come from? It is not from the manuals, cause I've checked
> those. How do you come up with this piece of art? Did you check this from
> the source code? Which lines revealed these secrets? I am eternally
> grateful for your help!
>
> Michael, maybe you happen to know how I can plugin in facet.threads
> parameter in that JSON body below, so the query uses more threads to
> compute the answer? I am dying out of curiosity.
>
> Cheers,
> Arturas
>
> On Thu, Dec 3, 2020 at 7:59 PM Michael Gibney 
> wrote:
>
> > I think the first "error" case in your set of examples above is closest to
> > being correct. For "query" facet type, I think you want to explicitly
> > specify `"type":"query"`, and specify the query itself in the `"q"` param,
> > i.e.:
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "aip": {
> > "type":  "query",
> > "q":  "cfname2:aip",
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > "limit": 1
> > }
> > }
> > }
> > }
> > }
> >
> > On Thu, Dec 3, 2020 at 12:59 PM Arturas Mazeika  wrote:
> >
> > > Hi Michael,
> > >
> > > Thanks for helping me to figure this out.
> > >
> > > If I fire:
> > >
> > > {
> > > "query"  : "*:*",
> > > "limit"  : 0,
> > >
> > > "facet": {
> > > "aip": { "query":  "cfname2:aip", }
> > >
> > > }
> > > }
> > >
> > > I get
> > >
> > > "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> > > "docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }
> > >
> > > (works). If I fire
> > >
> > >
> > > {
> > > "query"  : "*:*",
> > > "limit"  : 0,
> > >
> > > "facet": {
> > > "t_buckets": {
> > > "type":  "range",
> > > "field": "t",
> > > "sort": { "t": "asc" },
> > > "start": "2018-05-02T17:00:00.000Z",
> > > "end":   "2020-11-16T21:00:00.000Z",
> > > "gap":   "+1HOUR"
> > > "limit": 1
> > > }
> > > }
> > > }
> > >
> > > I get
> > >
> > > "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> > > "docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [
> > {
> > > "val": "2018-05-02T17:00:00Z", "count": 150 },
> > >
> > > (works). If I fire:
> > >
> > > {
> > > "query"  : "*:*",
> > > "limit"  : 0,
> > >
> > > "facet": {
> > > "aip": { "query":  "cfname2:aip"

Re: Can I express this nested query in JSON DSL?

2020-12-08 Thread Mikhail Khludnev
Hi, Mikhail.
Shouldn't be a big deal
"bool":{
 "must":[ "x",
{"bool":
 {"should":["y","z"]}}]
}

On Tue, Dec 8, 2020 at 6:13 AM Mikhail Edoshin 
wrote:

> Hi,
>
> I'm more or less new to Solr. I need to run queries that use joins all
> over the place. (The idea is to index database records pretty much as
> they are and then query them in interesting ways and, most importantly,
> get the rank. Our dataset is not too large so the performance is great.)
>
> I managed to express the logic using the following approach. For
> example, I want to search people by their names or addresses:
>
>q=type:Person^=0 AND ({!edismax qf= v=$p0} OR {!join
>  v=$p1})
>p1={!edismax qf= v=p0}
>p0=
>
> (Here 'type:Person' works as a filter so I zero its score.) This seems
> to work as expected and give the right results and ranking. It also
> seems to scale nicely for two levels of joins, although the queries
> become rather hard to follow in their raw form (I used a custom
> XML-to-query transformer to actually formulate more complex queries).
>
> So my question is that: can I express an equivalent query using the
> query DSL? I know I can use 'bool' like that:
>
> {
>"query": {
>   "bool" : {
>  "must" : [ ... ];
>  "should" : [ ... ]
>}
> }
>   }
>
> But how do I actually go from 'x AND (y OR z)' to 'bool' in the query
> DSL? I seem to lose the nice compositional properties of the expression.
> Here, for example, the expression implies that at least 'y' or 'z' must
> match; I don't quite see how I can express this in the DSL.
>
> Kind regards,
> Mikhail
>


-- 
Sincerely yours
Mikhail Khludnev


Can I express this nested query in JSON DSL?

2020-12-08 Thread Mikhail Edoshin

Hi,

I'm more or less new to Solr. I need to run queries that use joins all 
over the place. (The idea is to index database records pretty much as 
they are and then query them in interesting ways and, most importantly, 
get the rank. Our dataset is not too large so the performance is great.)


I managed to express the logic using the following approach. For 
example, I want to search people by their names or addresses:


  q=type:Person^=0 AND ({!edismax qf= v=$p0} OR {!join 
 v=$p1})

  p1={!edismax qf= v=p0}
  p0=

(Here 'type:Person' works as a filter so I zero its score.) This seems 
to work as expected and give the right results and ranking. It also 
seems to scale nicely for two levels of joins, although the queries 
become rather hard to follow in their raw form (I used a custom 
XML-to-query transformer to actually formulate more complex queries).


So my question is that: can I express an equivalent query using the 
query DSL? I know I can use 'bool' like that:


{
  "query": {
 "bool" : {
    "must" : [ ... ];
    "should" : [ ... ]
  }
   }
 }

But how do I actually go from 'x AND (y OR z)' to 'bool' in the query 
DSL? I seem to lose the nice compositional properties of the expression. 
Here, for example, the expression implies that at least 'y' or 'z' must 
match; I don't quite see how I can express this in the DSL.


Kind regards,
Mikhail


Re: What's the most efficient way to check if there are any matches for a query?

2020-12-07 Thread Colvin Cowie
Thanks for the suggestions. At some point I'll have to actually put it to
the test and see what impact everything has.

Cheers

On Sat, 5 Dec 2020 at 13:31, Erick Erickson  wrote:

> Have you looked at the Term Query Parser (_not_ the TermS Query Parser)
> or Raw Query Parser?
>
> https://lucene.apache.org/solr/guide/8_4/other-parsers.html
>
> NOTE: these perform _no_ analysis, so you have to give them the exact
> term...
>
> These are pretty low level, and if they’re “fast enough” you won’t have to
> do
> any work. You could do some Lucene-level coding I suspect to improve that,
> depends on whether you think those are fast enough…
>
> Best,
> Erick
>
>
> > On Dec 5, 2020, at 5:04 AM, Colvin Cowie 
> wrote:
> >
> > Hello,
> >
> > I was just wondering. If I don't care about the number of matches for a
> > query, let alone what the matches are, just that there is *at least 1*
> > match for a query, what's the most efficient way to execute that query
> (on
> > the /select handler)? (Using Solr 8.7)
> >
> > As a general approach for a query is "rows=0=id asc" the best I can
> > do? Is there a more aggressive short circuit that will stop a searcher as
> > soon as it finds a match?
> >
> > For a specific case where the query is for a single exact term in an
> > indexed field (with or without doc values) is there a different answer?
> >
> > Thanks for any suggestions
>
>


Re: What's the most efficient way to check if there are any matches for a query?

2020-12-05 Thread Erick Erickson
Have you looked at the Term Query Parser (_not_ the TermS Query Parser)
or Raw Query Parser? 

https://lucene.apache.org/solr/guide/8_4/other-parsers.html

NOTE: these perform _no_ analysis, so you have to give them the exact term...

These are pretty low level, and if they’re “fast enough” you won’t have to do
any work. You could do some Lucene-level coding I suspect to improve that,
depends on whether you think those are fast enough…

Best,
Erick


> On Dec 5, 2020, at 5:04 AM, Colvin Cowie  wrote:
> 
> Hello,
> 
> I was just wondering. If I don't care about the number of matches for a
> query, let alone what the matches are, just that there is *at least 1*
> match for a query, what's the most efficient way to execute that query (on
> the /select handler)? (Using Solr 8.7)
> 
> As a general approach for a query is "rows=0=id asc" the best I can
> do? Is there a more aggressive short circuit that will stop a searcher as
> soon as it finds a match?
> 
> For a specific case where the query is for a single exact term in an
> indexed field (with or without doc values) is there a different answer?
> 
> Thanks for any suggestions



What's the most efficient way to check if there are any matches for a query?

2020-12-05 Thread Colvin Cowie
Hello,

I was just wondering. If I don't care about the number of matches for a
query, let alone what the matches are, just that there is *at least 1*
match for a query, what's the most efficient way to execute that query (on
the /select handler)? (Using Solr 8.7)

As a general approach for a query is "rows=0=id asc" the best I can
do? Is there a more aggressive short circuit that will stop a searcher as
soon as it finds a match?

For a specific case where the query is for a single exact term in an
indexed field (with or without doc values) is there a different answer?

Thanks for any suggestions


Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Michael,

I wish I were able to do a percent of what you are doing. Where does your
inspiration come from? It is not from the manuals, cause I've checked
those. How do you come up with this piece of art? Did you check this from
the source code? Which lines revealed these secrets? I am eternally
grateful for your help!

Michael, maybe you happen to know how I can plugin in facet.threads
parameter in that JSON body below, so the query uses more threads to
compute the answer? I am dying out of curiosity.

Cheers,
Arturas

On Thu, Dec 3, 2020 at 7:59 PM Michael Gibney 
wrote:

> I think the first "error" case in your set of examples above is closest to
> being correct. For "query" facet type, I think you want to explicitly
> specify `"type":"query"`, and specify the query itself in the `"q"` param,
> i.e.:
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": {
> "type":  "query",
> "q":  "cfname2:aip",
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
> }
> }
>
> On Thu, Dec 3, 2020 at 12:59 PM Arturas Mazeika  wrote:
>
> > Hi Michael,
> >
> > Thanks for helping me to figure this out.
> >
> > If I fire:
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "aip": { "query":  "cfname2:aip", }
> >
> > }
> > }
> >
> > I get
> >
> > "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> > "docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }
> >
> > (works). If I fire
> >
> >
> > {
> > "query"  : "*:*",
> >     "limit"  : 0,
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > "limit": 1
> > }
> > }
> > }
> >
> > I get
> >
> > "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> > "docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [
> {
> > "val": "2018-05-02T17:00:00Z", "count": 150 },
> >
> > (works). If I fire:
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> >
> > "facet": {
> > "aip": { "query":  "cfname2:aip",
> >
> > "facet": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > "limit": 1
> > }
> > }
> > }
> > }
> > }
> >
> > I get
> >
> > "error": { "metadata": [ "error-class",
> > "org.apache.solr.common.SolrException", "root-error-class",
> > "org.apache.solr.common.SolrException" ], "msg": "expected facet/stat
> type
> > name,

Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Michael Gibney
I think the first "error" case in your set of examples above is closest to
being correct. For "query" facet type, I think you want to explicitly
specify `"type":"query"`, and specify the query itself in the `"q"` param,
i.e.:
{
"query"  : "*:*",
    "limit"  : 0,

"facet": {
"aip": {
"type":  "query",
"q":  "cfname2:aip",
"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
    "limit": 1
}
}
}
}
}

On Thu, Dec 3, 2020 at 12:59 PM Arturas Mazeika  wrote:

> Hi Michael,
>
> Thanks for helping me to figure this out.
>
> If I fire:
>
> {
>     "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": { "query":  "cfname2:aip", }
>
> }
> }
>
> I get
>
> "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> "docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }
>
> (works). If I fire
>
>
> {
> "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "t_buckets": {
>     "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
>
> I get
>
> "response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
> "docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [ {
> "val": "2018-05-02T17:00:00Z", "count": 150 },
>
> (works). If I fire:
>
> {
>     "query"  : "*:*",
> "limit"  : 0,
>
> "facet": {
> "aip": { "query":  "cfname2:aip",
>
> "facet": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
> }
> }
>
> I get
>
> "error": { "metadata": [ "error-class",
> "org.apache.solr.common.SolrException", "root-error-class",
> "org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
> name, like {type:range, field:price, ...} but got null , path=/facet",
> "code": 400 } }
>
> If I fire
>
> {
> "query"  : "*:*",
>     "limit"  : 0,
>
> "facet": {
> "aip": { "query":  "cfname2:aip",
>
> "facet": {
> "type":  "range",
>     "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> "limit": 1
> }
> }
> }
> }
>
> I get
>
> "error": { "metadata": [ "error-class",
> "org.apache.solr.common.SolrException", "root-error-class",
> "org.apache.solr.common.SolrException" ], "msg&

Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Michael,

Thanks for helping me to figure this out.

If I fire:

{
"query"  : "*:*",
"limit"  : 0,

    "facet": {
"aip": { "query":  "cfname2:aip", }

}
}

I get

"response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
"docs": [] }, "facets": { "count": 20560849, "aip": { "count": 2307 } } }

(works). If I fire


{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
    "limit": 1
}
}
}

I get

"response": { "numFound": 20560849, "start": 0, "numFoundExact": true,
"docs": [] }, "facets": { "count": 20560849, "t_buckets": { "buckets": [ {
"val": "2018-05-02T17:00:00Z", "count": 150 },

(works). If I fire:

{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"aip": { "query":  "cfname2:aip",

"facet": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
"limit": 1
}
}
}
}
}

I get

"error": { "metadata": [ "error-class",
"org.apache.solr.common.SolrException", "root-error-class",
"org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
name, like {type:range, field:price, ...} but got null , path=/facet",
"code": 400 } }

If I fire

{
"query"  : "*:*",
"limit"  : 0,

"facet": {
"aip": { "query":  "cfname2:aip",

"facet": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
"limit": 1
    }
}
}
}

I get

"error": { "metadata": [ "error-class",
"org.apache.solr.common.SolrException", "root-error-class",
"org.apache.solr.common.SolrException" ], "msg": "expected facet/stat type
name, like {type:range, field:price, ...} but got null , path=/facet",
"code": 400 } }

What else can I try out?

Cheers,
Arturas

On Thu, Dec 3, 2020 at 3:55 PM Michael Gibney 
wrote:

> Arturas,
> I think your syntax is wrong for the range subfacet? -- the configuration
> of the range facet should be directly under the `tt` key, rather than
> nested under `t_buckets` in the request. (The response introduces a
> "buckets" attribute that is not part of the request syntax).
> Michael
>
> On Thu, Dec 3, 2020 at 3:47 AM Arturas Mazeika  wrote:
>
> > Hi Solr Team,
> >
> > I am trying to check how I can formulate facet queries using JSON
> format. I
> > can successfully formulate query, range, term queries, as well as nested
> > term queries. How can I formulate a nested facet query involving "query"
> as
> > well as "range" formulations? The following does not work:
> >
> >
> > GET http://localhost:/solr/db/query HTTP/1.1
> > content-type: application/json
> >
> > {
> > "query"  : "*:*",
> > "limit"  : 0,
> > "facet": {
> > "a1": { "query":  "cfname2:1" },
> > "a2": { "query":  "cfname2:2" },
> > "a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
> > "a4": { "query":  "cfname2:4" },
> > "a5": { "query":  "cfname2:5" },
> > "a6": { "query":  "cfname2:6" },
> >
> > "tt": {
> > "t_buckets": {
> > "type":  "range",
> > "field": "t",
> > "sort": { "t": "asc" },
> > "start": "2018-05-02T17:00:00.000Z",
> > "end":   "2020-11-16T21:00:00.000Z",
> > "gap":   "+1HOUR"
> > }
> > }
> > }
> > }
> >
> > Single (not nested facets separately on individual queries as well as for
> > range) work in flying colors.
> >
> > Cheers,
> > Arturas
> >
>


Re: nested facets of query and terms type in JSON format

2020-12-03 Thread Michael Gibney
Arturas,
I think your syntax is wrong for the range subfacet? -- the configuration
of the range facet should be directly under the `tt` key, rather than
nested under `t_buckets` in the request. (The response introduces a
"buckets" attribute that is not part of the request syntax).
Michael

On Thu, Dec 3, 2020 at 3:47 AM Arturas Mazeika  wrote:

> Hi Solr Team,
>
> I am trying to check how I can formulate facet queries using JSON format. I
> can successfully formulate query, range, term queries, as well as nested
> term queries. How can I formulate a nested facet query involving "query" as
> well as "range" formulations? The following does not work:
>
>
> GET http://localhost:/solr/db/query HTTP/1.1
> content-type: application/json
>
> {
> "query"  : "*:*",
> "limit"  : 0,
> "facet": {
> "a1": { "query":  "cfname2:1" },
> "a2": { "query":  "cfname2:2" },
> "a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
> "a4": { "query":  "cfname2:4" },
> "a5": { "query":  "cfname2:5" },
> "a6": { "query":  "cfname2:6" },
>
> "tt": {
> "t_buckets": {
> "type":  "range",
> "field": "t",
> "sort": { "t": "asc" },
> "start": "2018-05-02T17:00:00.000Z",
> "end":   "2020-11-16T21:00:00.000Z",
> "gap":   "+1HOUR"
> }
> }
> }
> }
>
> Single (not nested facets separately on individual queries as well as for
> range) work in flying colors.
>
> Cheers,
> Arturas
>


nested facets of query and terms type in JSON format

2020-12-03 Thread Arturas Mazeika
Hi Solr Team,

I am trying to check how I can formulate facet queries using JSON format. I
can successfully formulate query, range, term queries, as well as nested
term queries. How can I formulate a nested facet query involving "query" as
well as "range" formulations? The following does not work:


GET http://localhost:/solr/db/query HTTP/1.1
content-type: application/json

{
"query"  : "*:*",
"limit"  : 0,
"facet": {
    "a1": { "query":  "cfname2:1" },
"a2": { "query":  "cfname2:2" },
    "a3": { "field":  "cfname2", "type":"terms", "prefix":"3" },
"a4": { "query":  "cfname2:4" },
"a5": { "query":  "cfname2:5" },
"a6": { "query":  "cfname2:6" },

"tt": {
"t_buckets": {
"type":  "range",
"field": "t",
"sort": { "t": "asc" },
"start": "2018-05-02T17:00:00.000Z",
"end":   "2020-11-16T21:00:00.000Z",
"gap":   "+1HOUR"
}
}
}
}

Single (not nested facets separately on individual queries as well as for
range) work in flying colors.

Cheers,
Arturas


Re: Solr 8.4.1, NOT NULL query not working on plong & pint type fields (fieldname:* )

2020-11-26 Thread Deepu
Hi Shawn,
Thanks for taking time and replay.

Thanks,
Deepu

On Thu, Nov 26, 2020 at 10:53 PM Shawn Heisey  wrote:

> On 11/25/2020 10:42 AM, Deepu wrote:
> > We are in the process of migrating from Solr 5 to Solr 8, during testing
> > identified that "Not null" queries on plong & pint field types are not
> > giving any results, it is working fine with solr 5.4 version.
> >
> > could you please let me know if you have suggestions on this issue?
>
> Here's a couple of facts:
>
> 1) Points-based fields have certain limitations that make explicit value
> lookups very slow, and make them unsuitable for use on uniqueKey fields.
>   Something about the field not having a "term" available.
>
> 2) A query of the type "fieldname:*" is a wildcard query.  These tend to
> be slow and inefficient, when they work.
>
> It might be that the limitations of point-based fields make it so that
> wildcard queries don't work.  I have no idea here.  Points-based fields
> did not exist in Solr 5.4, chances are that you were using a Trie-based
> field at that time.  A wildcard query would have worked, but it would
> have been slow.
>
> I may have a solution even though I am pretty clueless about what's
> going on.  When you are looking to do a NOT NULL sort of query, you
> should do it as a range query rather than a wildcard query.  This means
> the following syntax.   Note that it is case sensitive -- the "TO" must
> be uppercase:
>
> fieldname:[* TO *]
>
> This is how all NOT NULL queries should be constructed, regardless of
> the type of field.  Range queries tend to very efficient.
>
> Thanks,
> Shawn
>


Re: Solr 8.4.1, NOT NULL query not working on plong & pint type fields (fieldname:* )

2020-11-26 Thread Shawn Heisey

On 11/25/2020 10:42 AM, Deepu wrote:

We are in the process of migrating from Solr 5 to Solr 8, during testing
identified that "Not null" queries on plong & pint field types are not
giving any results, it is working fine with solr 5.4 version.

could you please let me know if you have suggestions on this issue?


Here's a couple of facts:

1) Points-based fields have certain limitations that make explicit value 
lookups very slow, and make them unsuitable for use on uniqueKey fields. 
 Something about the field not having a "term" available.


2) A query of the type "fieldname:*" is a wildcard query.  These tend to 
be slow and inefficient, when they work.


It might be that the limitations of point-based fields make it so that 
wildcard queries don't work.  I have no idea here.  Points-based fields 
did not exist in Solr 5.4, chances are that you were using a Trie-based 
field at that time.  A wildcard query would have worked, but it would 
have been slow.


I may have a solution even though I am pretty clueless about what's 
going on.  When you are looking to do a NOT NULL sort of query, you 
should do it as a range query rather than a wildcard query.  This means 
the following syntax.   Note that it is case sensitive -- the "TO" must 
be uppercase:


fieldname:[* TO *]

This is how all NOT NULL queries should be constructed, regardless of 
the type of field.  Range queries tend to very efficient.


Thanks,
Shawn


Re: Query generation is different for search terms with and without "-"

2020-11-25 Thread Walter Underwood
Ages ago at Netflix, I fixed this with a few hundred synonyms. If you are 
working with
a fixed vocabulary (movie titles, product names), that can work just fine.

babysitter, baby-sitter, baby sitter
fullmetal, full-metal, full metal
manhunter, man-hunter, man hunter
spiderman, spider-man, spider man

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 25, 2020, at 9:26 AM, Erick Erickson  wrote:
> 
> Parameters, no. You could use a PatternReplaceCharFilterFactory. NOTE:
> 
> *FilterFactory are _not_ what you want in this case, they are applied to 
> individual tokens after parsing
> 
> *CharFiterFactory are invoked on the entire input to the field, although I 
> can’t say for certain that even that’s early enough.
> 
> There are two other options to consider:
> StatelessScriptUpdateProcessor
> FieldMutatingUpdateProcessor
> 
> Stateless... is probably easiest…
> 
> Best,
> ERick
> 
>> On Nov 24, 2020, at 1:44 PM, Samuel Gutierrez 
>>  wrote:
>> 
>> Are there any good workarounds/parameters we can use to fix this so it
>> doesn't have to be solved client side?
>> 
>> On Tue, Nov 24, 2020 at 7:50 AM matthew sporleder 
>> wrote:
>> 
>>> Is the normal/standard solution here to regex remove the '-'s and
>>> combine them into a single token?
>>> 
>>> On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson 
>>> wrote:
>>>> 
>>>> This is a common point of confusion. There are two phases for creating a
>>> query,
>>>> query _parsing_ first, then the analysis chain for the parsed result.
>>>> 
>>>> So what e-dismax sees in the two cases is:
>>>> 
>>>> Name_enUS:“high tech” -> two tokens, since there are two of them pf2
>>> comes into play.
>>>> 
>>>> Name_enUS:“high-tech” -> there’s only one token so pf2 doesn’t apply,
>>> splitting it on the hyphen comes later.
>>>> 
>>>> It’s especially confusing since the field analysis then breaks up
>>> “high-tech” into two tokens that
>>>> look the same as “high tech” in the debug response, just without the
>>> phrase query.
>>>> 
>>>> Name_enUS:high
>>>> Name_enUS:tech
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Nov 23, 2020, at 8:32 PM, Samuel Gutierrez <
>>> samuel.gutier...@iherb.com.INVALID> wrote:
>>>>> 
>>>>> I am troubleshooting an issue with ranking for search terms that
>>> contain a
>>>>> "-" vs the same query that does not contain the dash e.g. "high-tech"
>>> vs
>>>>> "high tech". The field that I am querying is using the standard
>>> tokenizer,
>>>>> so I would expect that the underlying lucene query should be the same
>>> for
>>>>> both versions of the query, however when printing the debug, it appears
>>>>> they are generated differently. I know "-" must be escaped as it has
>>>>> special meaning in lucene, however escaping does not fix the problem.
>>> It
>>>>> appears that with the "-" present, the pf2 edismax parameter is not
>>>>> respected and omitted from the final query. We use sow=false as we have
>>>>> multiterm synonyms and need to ensure they are included in the final
>>> lucene
>>>>> query. My expectation is that the final underlying lucene query should
>>> be
>>>>> based on the output  of the field analyzer, however after briefly
>>> looking
>>>>> at the code for ExtendedDismaxQParser, it appears that there is some
>>> string
>>>>> processing happening outside of the analysis step which causes the
>>>>> unexpected lucene query.
>>>>> 
>>>>> 
>>>>> Solr Debug for "high tech":
>>>>> 
>>>>> parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4)
>>>>> DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2
>>>>> DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)
>>>>> DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)",
>>>>> parsedquery_toString: "+(((Name_enUS:high)~0.4
>>>>> (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4
>>>>> (Name_enUS:"high tech"~4)~0.4",
>>>>> 
>>>>> 
>>>>> Solr Debug for "high-tec

Solr 8.4.1, NOT NULL query not working on plong & pint type fields (fieldname:* )

2020-11-25 Thread Deepu
Dear Team,

We are in the process of migrating from Solr 5 to Solr 8, during testing
identified that "Not null" queries on plong & pint field types are not
giving any results, it is working fine with solr 5.4 version.

could you please let me know if you have suggestions on this issue?

Thanks
Deepu


Re: Query generation is different for search terms with and without "-"

2020-11-25 Thread Erick Erickson
Parameters, no. You could use a PatternReplaceCharFilterFactory. NOTE:

*FilterFactory are _not_ what you want in this case, they are applied to 
individual tokens after parsing

*CharFiterFactory are invoked on the entire input to the field, although I 
can’t say for certain that even that’s early enough.

There are two other options to consider:
StatelessScriptUpdateProcessor
FieldMutatingUpdateProcessor

Stateless... is probably easiest…

Best,
ERick

> On Nov 24, 2020, at 1:44 PM, Samuel Gutierrez 
>  wrote:
> 
> Are there any good workarounds/parameters we can use to fix this so it
> doesn't have to be solved client side?
> 
> On Tue, Nov 24, 2020 at 7:50 AM matthew sporleder 
> wrote:
> 
>> Is the normal/standard solution here to regex remove the '-'s and
>> combine them into a single token?
>> 
>> On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson 
>> wrote:
>>> 
>>> This is a common point of confusion. There are two phases for creating a
>> query,
>>> query _parsing_ first, then the analysis chain for the parsed result.
>>> 
>>> So what e-dismax sees in the two cases is:
>>> 
>>> Name_enUS:“high tech” -> two tokens, since there are two of them pf2
>> comes into play.
>>> 
>>> Name_enUS:“high-tech” -> there’s only one token so pf2 doesn’t apply,
>> splitting it on the hyphen comes later.
>>> 
>>> It’s especially confusing since the field analysis then breaks up
>> “high-tech” into two tokens that
>>> look the same as “high tech” in the debug response, just without the
>> phrase query.
>>> 
>>> Name_enUS:high
>>> Name_enUS:tech
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Nov 23, 2020, at 8:32 PM, Samuel Gutierrez <
>> samuel.gutier...@iherb.com.INVALID> wrote:
>>>> 
>>>> I am troubleshooting an issue with ranking for search terms that
>> contain a
>>>> "-" vs the same query that does not contain the dash e.g. "high-tech"
>> vs
>>>> "high tech". The field that I am querying is using the standard
>> tokenizer,
>>>> so I would expect that the underlying lucene query should be the same
>> for
>>>> both versions of the query, however when printing the debug, it appears
>>>> they are generated differently. I know "-" must be escaped as it has
>>>> special meaning in lucene, however escaping does not fix the problem.
>> It
>>>> appears that with the "-" present, the pf2 edismax parameter is not
>>>> respected and omitted from the final query. We use sow=false as we have
>>>> multiterm synonyms and need to ensure they are included in the final
>> lucene
>>>> query. My expectation is that the final underlying lucene query should
>> be
>>>> based on the output  of the field analyzer, however after briefly
>> looking
>>>> at the code for ExtendedDismaxQParser, it appears that there is some
>> string
>>>> processing happening outside of the analysis step which causes the
>>>> unexpected lucene query.
>>>> 
>>>> 
>>>> Solr Debug for "high tech":
>>>> 
>>>> parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4)
>>>> DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2
>>>> DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)
>>>> DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)",
>>>> parsedquery_toString: "+(((Name_enUS:high)~0.4
>>>> (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4
>>>> (Name_enUS:"high tech"~4)~0.4",
>>>> 
>>>> 
>>>> Solr Debug for "high-tech"
>>>> 
>>>> parsedquery: "+DisjunctionMaxQueryName_enUS:high
>>>> Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high
>>>> tech"~5)~0.4)",
>>>> parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4
>>>> (Name_enUS:"high tech"~5)~0.4"
>>>> 
>>>> SolrConfig:
>>>> 
>>>> 
>>>>   
>>>> true
>>>> true
>>>> json
>>>> 375%
>>>> Name_enUS
>>>> Name_enUS
>>>> 5
>>>> Name_enUS
>>>> 4   
>>>> 3
>>>> 0.4
>>>> explicit
>>>> 100
>>>> false
>>>>   
>>>>   
>>>> edismax
>>>>   
>>>> 
>>>> 
>>>> Schema:
>>>> 
>>>> > positionIncrementGap="100">
>>>> 
>>>>   
>>>>   
>>>>   
>>>>   
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Using Solr 8.6.3
>>>> 
>> 
> 
> -- 
> *The information contained in this message is the sole and exclusive 
> property of ***iHerb Inc.*** and may be privileged and confidential. It may 
> not be disseminated or distributed to persons or entities other than the 
> ones intended without the written authority of ***iHerb Inc.** *If you have 
> received this e-mail in error or are not the intended recipient, you may 
> not use, copy, disseminate or distribute it. Do not open any attachments. 
> Please delete it immediately from your system and notify the sender 
> promptly by e-mail that you have done so.*



Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread Samuel Gutierrez
Are there any good workarounds/parameters we can use to fix this so it
doesn't have to be solved client side?

On Tue, Nov 24, 2020 at 7:50 AM matthew sporleder 
wrote:

> Is the normal/standard solution here to regex remove the '-'s and
> combine them into a single token?
>
> On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson 
> wrote:
> >
> > This is a common point of confusion. There are two phases for creating a
> query,
> > query _parsing_ first, then the analysis chain for the parsed result.
> >
> > So what e-dismax sees in the two cases is:
> >
> > Name_enUS:“high tech” -> two tokens, since there are two of them pf2
> comes into play.
> >
> > Name_enUS:“high-tech” -> there’s only one token so pf2 doesn’t apply,
> splitting it on the hyphen comes later.
> >
> > It’s especially confusing since the field analysis then breaks up
> “high-tech” into two tokens that
> > look the same as “high tech” in the debug response, just without the
> phrase query.
> >
> > Name_enUS:high
> > Name_enUS:tech
> >
> > Best,
> > Erick
> >
> > > On Nov 23, 2020, at 8:32 PM, Samuel Gutierrez <
> samuel.gutier...@iherb.com.INVALID> wrote:
> > >
> > > I am troubleshooting an issue with ranking for search terms that
> contain a
> > > "-" vs the same query that does not contain the dash e.g. "high-tech"
> vs
> > > "high tech". The field that I am querying is using the standard
> tokenizer,
> > > so I would expect that the underlying lucene query should be the same
> for
> > > both versions of the query, however when printing the debug, it appears
> > > they are generated differently. I know "-" must be escaped as it has
> > > special meaning in lucene, however escaping does not fix the problem.
> It
> > > appears that with the "-" present, the pf2 edismax parameter is not
> > > respected and omitted from the final query. We use sow=false as we have
> > > multiterm synonyms and need to ensure they are included in the final
> lucene
> > > query. My expectation is that the final underlying lucene query should
> be
> > > based on the output  of the field analyzer, however after briefly
> looking
> > > at the code for ExtendedDismaxQParser, it appears that there is some
> string
> > > processing happening outside of the analysis step which causes the
> > > unexpected lucene query.
> > >
> > >
> > > Solr Debug for "high tech":
> > >
> > > parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4)
> > > DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2
> > > DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)
> > > DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)",
> > > parsedquery_toString: "+(((Name_enUS:high)~0.4
> > > (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4
> > > (Name_enUS:"high tech"~4)~0.4",
> > >
> > >
> > > Solr Debug for "high-tech"
> > >
> > > parsedquery: "+DisjunctionMaxQueryName_enUS:high
> > > Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high
> > > tech"~5)~0.4)",
> > > parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4
> > > (Name_enUS:"high tech"~5)~0.4"
> > >
> > > SolrConfig:
> > >
> > >  
> > >
> > >  true
> > >  true
> > >  json
> > >  375%
> > >  Name_enUS
> > >  Name_enUS
> > >  5
> > >  Name_enUS
> > >  4   
> > >  3
> > >  0.4
> > >  explicit
> > >  100
> > >  false
> > >
> > >
> > >  edismax
> > >
> > >  
> > >
> > > Schema:
> > >
> > >   positionIncrementGap="100">
> > >  
> > >
> > >
> > >
> > >
> > >  
> > >  
> > >
> > >
> > > Using Solr 8.6.3
> > >
>

-- 
*The information contained in this message is the sole and exclusive 
property of ***iHerb Inc.*** and may be privileged and confidential. It may 
not be disseminated or distributed to persons or entities other than the 
ones intended without the written authority of ***iHerb Inc.** *If you have 
received this e-mail in error or are not the intended recipient, you may 
not use, copy, disseminate or distribute it. Do not open any attachments. 
Please delete it immediately from your system and notify the sender 
promptly by e-mail that you have done so.*


Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread matthew sporleder
Is the normal/standard solution here to regex remove the '-'s and
combine them into a single token?

On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson  wrote:
>
> This is a common point of confusion. There are two phases for creating a 
> query,
> query _parsing_ first, then the analysis chain for the parsed result.
>
> So what e-dismax sees in the two cases is:
>
> Name_enUS:“high tech” -> two tokens, since there are two of them pf2 comes 
> into play.
>
> Name_enUS:“high-tech” -> there’s only one token so pf2 doesn’t apply, 
> splitting it on the hyphen comes later.
>
> It’s especially confusing since the field analysis then breaks up “high-tech” 
> into two tokens that
> look the same as “high tech” in the debug response, just without the phrase 
> query.
>
> Name_enUS:high
> Name_enUS:tech
>
> Best,
> Erick
>
> > On Nov 23, 2020, at 8:32 PM, Samuel Gutierrez 
> >  wrote:
> >
> > I am troubleshooting an issue with ranking for search terms that contain a
> > "-" vs the same query that does not contain the dash e.g. "high-tech" vs
> > "high tech". The field that I am querying is using the standard tokenizer,
> > so I would expect that the underlying lucene query should be the same for
> > both versions of the query, however when printing the debug, it appears
> > they are generated differently. I know "-" must be escaped as it has
> > special meaning in lucene, however escaping does not fix the problem. It
> > appears that with the "-" present, the pf2 edismax parameter is not
> > respected and omitted from the final query. We use sow=false as we have
> > multiterm synonyms and need to ensure they are included in the final lucene
> > query. My expectation is that the final underlying lucene query should be
> > based on the output  of the field analyzer, however after briefly looking
> > at the code for ExtendedDismaxQParser, it appears that there is some string
> > processing happening outside of the analysis step which causes the
> > unexpected lucene query.
> >
> >
> > Solr Debug for "high tech":
> >
> > parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4)
> > DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2
> > DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)
> > DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)",
> > parsedquery_toString: "+(((Name_enUS:high)~0.4
> > (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4
> > (Name_enUS:"high tech"~4)~0.4",
> >
> >
> > Solr Debug for "high-tech"
> >
> > parsedquery: "+DisjunctionMaxQueryName_enUS:high
> > Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high
> > tech"~5)~0.4)",
> > parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4
> > (Name_enUS:"high tech"~5)~0.4"
> >
> > SolrConfig:
> >
> >  
> >
> >  true
> >  true
> >  json
> >  375%
> >  Name_enUS
> >  Name_enUS
> >  5
> >  Name_enUS
> >  4   
> >  3
> >  0.4
> >  explicit
> >  100
> >  false
> >
> >
> >  edismax
> >
> >  
> >
> > Schema:
> >
> >   > positionIncrementGap="100">
> >  
> >
> >
> >
> >
> >  
> >  
> >
> >
> > Using Solr 8.6.3
> >


Re: Query generation is different for search terms with and without "-"

2020-11-24 Thread Erick Erickson
This is a common point of confusion. There are two phases for creating a query,
query _parsing_ first, then the analysis chain for the parsed result.

So what e-dismax sees in the two cases is:

Name_enUS:“high tech” -> two tokens, since there are two of them pf2 comes into 
play.

Name_enUS:“high-tech” -> there’s only one token so pf2 doesn’t apply, splitting 
it on the hyphen comes later.

It’s especially confusing since the field analysis then breaks up “high-tech” 
into two tokens that
look the same as “high tech” in the debug response, just without the phrase 
query.

Name_enUS:high
Name_enUS:tech

Best,
Erick

> On Nov 23, 2020, at 8:32 PM, Samuel Gutierrez 
>  wrote:
> 
> I am troubleshooting an issue with ranking for search terms that contain a
> "-" vs the same query that does not contain the dash e.g. "high-tech" vs
> "high tech". The field that I am querying is using the standard tokenizer,
> so I would expect that the underlying lucene query should be the same for
> both versions of the query, however when printing the debug, it appears
> they are generated differently. I know "-" must be escaped as it has
> special meaning in lucene, however escaping does not fix the problem. It
> appears that with the "-" present, the pf2 edismax parameter is not
> respected and omitted from the final query. We use sow=false as we have
> multiterm synonyms and need to ensure they are included in the final lucene
> query. My expectation is that the final underlying lucene query should be
> based on the output  of the field analyzer, however after briefly looking
> at the code for ExtendedDismaxQParser, it appears that there is some string
> processing happening outside of the analysis step which causes the
> unexpected lucene query.
> 
> 
> Solr Debug for "high tech":
> 
> parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4)
> DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2
> DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)
> DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)",
> parsedquery_toString: "+(((Name_enUS:high)~0.4
> (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4
> (Name_enUS:"high tech"~4)~0.4",
> 
> 
> Solr Debug for "high-tech"
> 
> parsedquery: "+DisjunctionMaxQueryName_enUS:high
> Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high
> tech"~5)~0.4)",
> parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4
> (Name_enUS:"high tech"~5)~0.4"
> 
> SolrConfig:
> 
>  
>
>  true
>  true
>  json
>  375%
>  Name_enUS
>  Name_enUS
>  5
>  Name_enUS
>  4   
>  3
>  0.4
>  explicit
>  100
>  false
>
>
>  edismax
>
>  
> 
> Schema:
> 
>  
>  
>
>
>
>
>  
>  
> 
> 
> Using Solr 8.6.3
> 
> -- 
> *The information contained in this message is the sole and exclusive 
> property of ***iHerb Inc.*** and may be privileged and confidential. It may 
> not be disseminated or distributed to persons or entities other than the 
> ones intended without the written authority of ***iHerb Inc.** *If you have 
> received this e-mail in error or are not the intended recipient, you may 
> not use, copy, disseminate or distribute it. Do not open any attachments. 
> Please delete it immediately from your system and notify the sender 
> promptly by e-mail that you have done so.*



RE: Use stream result like a query (alternative to innerJoin)

2020-11-24 Thread ufuk yılmaz
Fetch would work for my specific case (since I’m working with id’s there’s no 
one to many), if I was able to restrict fetch’s target domain with a query. I 
would first get all possible deleted ids, then use fetch to the items 
collection. But then the current fetch implementation would find all deleted 
items, not something like “deleted items with these names” or “deleted items 
between this time” etc.

I came upon your video while researching this stuff: 
https://www.youtube.com/watch?v=kTNe3TaqFvo

I’m trying to use the “let” expression to feed one stream’s result to another 
as a query, using string concat function and eval stream. So far I couldn’t 
write a working example, but it’s an idea that I’m playing with.


Sent from Mail for Windows 10

From: Joel Bernstein
Sent: 23 November 2020 23:23
To: solr-user@lucene.apache.org
Subject: Re: Use stream result like a query (alternative to innerJoin)

H



Query generation is different for search terms with and without "-"

2020-11-23 Thread Samuel Gutierrez
I am troubleshooting an issue with ranking for search terms that contain a
"-" vs the same query that does not contain the dash e.g. "high-tech" vs
"high tech". The field that I am querying is using the standard tokenizer,
so I would expect that the underlying lucene query should be the same for
both versions of the query, however when printing the debug, it appears
they are generated differently. I know "-" must be escaped as it has
special meaning in lucene, however escaping does not fix the problem. It
appears that with the "-" present, the pf2 edismax parameter is not
respected and omitted from the final query. We use sow=false as we have
multiterm synonyms and need to ensure they are included in the final lucene
query. My expectation is that the final underlying lucene query should be
based on the output  of the field analyzer, however after briefly looking
at the code for ExtendedDismaxQParser, it appears that there is some string
processing happening outside of the analysis step which causes the
unexpected lucene query.


Solr Debug for "high tech":

parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4)
DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2
DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)
DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)",
parsedquery_toString: "+(((Name_enUS:high)~0.4
(Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4
(Name_enUS:"high tech"~4)~0.4",


Solr Debug for "high-tech"

parsedquery: "+DisjunctionMaxQueryName_enUS:high
Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high
tech"~5)~0.4)",
parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4
(Name_enUS:"high tech"~5)~0.4"

SolrConfig:

  

  true
  true
  json
  375%
  Name_enUS
  Name_enUS
  5
  Name_enUS
  4   
  3
  0.4
  explicit
  100
  false


  edismax

  

Schema:

  
  




  
  


Using Solr 8.6.3

-- 
*The information contained in this message is the sole and exclusive 
property of ***iHerb Inc.*** and may be privileged and confidential. It may 
not be disseminated or distributed to persons or entities other than the 
ones intended without the written authority of ***iHerb Inc.** *If you have 
received this e-mail in error or are not the intended recipient, you may 
not use, copy, disseminate or distribute it. Do not open any attachments. 
Please delete it immediately from your system and notify the sender 
promptly by e-mail that you have done so.*


Re: Use stream result like a query (alternative to innerJoin)

2020-11-23 Thread Joel Bernstein
Here is the documentation for fetch:

https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#fetch


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Nov 23, 2020 at 3:22 PM Joel Bernstein  wrote:

> There are two streams that behave like that.
>
> One is the "nodes" expression, which is not going to work for this use
> case because it does everything in memory.
>
> The second one is the "fetch" expression which behaves like a nested loop
> join with some limitations. Unfortunately the main limitation is likely to
> be a blocker for you which is that it doesn't support one-to-many joins yet.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz 
> wrote:
>
>> Hi all,
>>
>> I’m looking for a way to query two collections and find documents that
>> exist in both, I know this can be done with innerJoin streaming expression
>> but I want to avoid it, since one of the collection streams can possibly
>> have billions of results:
>>
>> Let’s say two collections are:
>>
>> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
>> items = [
>> {
>> id: 1,
>> name: "a"
>> },
>> {   id: 2,
>> name: "b"
>> },
>> {
>>     id: 3,
>> name: "c"
>> }.
>> ]
>>
>> “deletedItems” contain a few documents compared to “items” collection
>> (1mil vs 2-3 bil). If I query them both with a typical query in our system,
>> deletedItems gives a few thousand results but items give tens/hundreds of
>> millions. To use innerJoin, I have to stream the whole items result to
>> worker node over network.
>>
>> Is there a way to avoid this, something like using “deletedItems” result
>> as a query to “items” stream?
>>
>> Thanks in advance for the help
>>
>> Sent from Mail for Windows 10
>>
>>


Re: Use stream result like a query (alternative to innerJoin)

2020-11-23 Thread Joel Bernstein
There are two streams that behave like that.

One is the "nodes" expression, which is not going to work for this use case
because it does everything in memory.

The second one is the "fetch" expression which behaves like a nested loop
join with some limitations. Unfortunately the main limitation is likely to
be a blocker for you which is that it doesn't support one-to-many joins yet.

Joel Bernstein
http://joelsolr.blogspot.com/


On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz 
wrote:

> Hi all,
>
> I’m looking for a way to query two collections and find documents that
> exist in both, I know this can be done with innerJoin streaming expression
> but I want to avoid it, since one of the collection streams can possibly
> have billions of results:
>
> Let’s say two collections are:
>
> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
> items = [
> {
> id: 1,
> name: "a"
> },
> {   id: 2,
> name: "b"
> },
> {
> id: 3,
> name: "c"
> }.
> ]
>
> “deletedItems” contain a few documents compared to “items” collection
> (1mil vs 2-3 bil). If I query them both with a typical query in our system,
> deletedItems gives a few thousand results but items give tens/hundreds of
> millions. To use innerJoin, I have to stream the whole items result to
> worker node over network.
>
> Is there a way to avoid this, something like using “deletedItems” result
> as a query to “items” stream?
>
> Thanks in advance for the help
>
> Sent from Mail for Windows 10
>
>


Use stream result like a query (alternative to innerJoin)

2020-11-22 Thread ufuk yılmaz
Hi all,

I’m looking for a way to query two collections and find documents that exist in 
both, I know this can be done with innerJoin streaming expression but I want to 
avoid it, since one of the collection streams can possibly have billions of 
results:

Let’s say two collections are:

deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
items = [
{
id: 1,
name: "a"
},
{   id: 2,
name: "b"
},
{
id: 3,
name: "c"
}.
]

“deletedItems” contain a few documents compared to “items” collection (1mil vs 
2-3 bil). If I query them both with a typical query in our system, deletedItems 
gives a few thousand results but items give tens/hundreds of millions. To use 
innerJoin, I have to stream the whole items result to worker node over network.

Is there a way to avoid this, something like using “deletedItems” result as a 
query to “items” stream?

Thanks in advance for the help

Sent from Mail for Windows 10



Re: SolrJ NestableJsonFacet ordering of query facet

2020-11-19 Thread Jason Gerlowski
Hi Shivram,

I think the short answer is "no".  At least, not without sub-classing
some of the JSON-Facet classes in SolrJ.

But it's hard for me to understand your particular concern without
seeing a concrete example.  If you provide an example (maybe in the
form of a JUnit test snippet showing the actual and expected values),
I may be able to provide more help.

Best,

Jason

On Fri, Oct 30, 2020 at 1:54 AM Shivam Jha  wrote:
>
> Hi folks,
>
> Does anyone have any advice on this issue?
>
> Thanks,
> Shivam
>
> On Tue, Oct 27, 2020 at 1:20 PM Shivam Jha  wrote:
>
> > Hi folks,
> >
> > Doing some faceted queries using 'facet.json' param and SolrJ, the results
> > of which I am processing using SolrJ NestableJsonFacet class.
> > basically as   *queryResponse.getJsonFacetingResponse() -> returns 
> > *NestableJsonFacet
> > object.
> >
> > But I have noticed it does not maintain the facet-query order in which it
> > was given in *facet.json.*
> > *Direct queries to solr do maintain that order, but not after it comes to
> > Java layer in SolrJ.*
> >
> > Is there a way to make it maintain that order ?
> > Hopefully the question makes sense, if not please let me know I can
> > clarify further.
> >
> > Thanks,
> > Shivam
> >
>
>
> --
> shivamJha


Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-11 Thread Edward Turner
Many thanks Walter, that's useful information. And yes, if we are able to
keep stopwords, then we will. We have been exploring it because we've
noticed its use leads to a sizable drop in index size (5%, in some of our
tests), which then had the knock on effect of better performance. (Also,
unfortunately, we do not have the luxury of using super big
machines/storage -- so it's always a balancing act for us.)

Best,
Edd

Edward Turner


On Tue, 10 Nov 2020 at 16:22, Walter Underwood 
wrote:

> By far the simplest solution is to leave stopwords in the index. That also
> improves
> relevance, because it becomes possible to search for “vitamin a” or “to be
> or not to be”.
>
> Stopword remove was a performance and disk space hack from the 1960s. It
> is no
> longer needed. We were keeping stopwords in the index at Infoseek, back in
> 1996.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Nov 10, 2020, at 1:16 AM, Edward Turner  wrote:
> >
> > Hi all,
> >
> > Okay, I've been doing more research about this problem and from what I
> > understand, phrase queries + stopwords are known to have some
> difficulties
> > working together in some circumstances.
> >
> > E.g.,
> >
> https://stackoverflow.com/questions/56802656/stopwords-and-phrase-queries-solr?rq=1
> > https://issues.apache.org/jira/browse/SOLR-6468
> >
> > I was thinking about workarounds, but each solution I've attempted
> doesn't
> > quite work.
> >
> > Therefore, maybe one possible solution is to take a step back and
> > preprocess index/query data going to Solr, something like:
> >
> > String wordsForSolr = removeStopWordsFrom("This is pretend index or query
> > data")
> > // wordsForSolr = "pretend index query data"
> >
> > Off the top of my head, this will by-pass position issues.
> >
> > I will give this a go, but was wondering whether this is something others
> > have done?
> >
> > Best wishes,
> > Edd
> >
> > 
> > Edward Turner
> >
> >
> > On Fri, 6 Nov 2020 at 13:58, Edward Turner  wrote:
> >
> >> Hi all,
> >>
> >> We are experiencing some unexpected behaviour for phrase queries which
> we
> >> believe might be related to the FlattenGraphFilterFactory and stopwords.
> >>
> >> Brief description: when performing a phrase query
> >> "Molecular cloning and evolution of the" => we get expected hits
> >> "Molecular cloning and evolution of the genes" => we get no hits
> >> (unexpected behaviour)
> >>
> >> I think it's worthwhile adding the analyzers we use to help you see what
> >> we're doing:
> >>  Analyzers 
> >>  >>   sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
> >>   
> >>   >> pattern="[- /()]+" />
> >>   >> ignoreCase="true" />
> >>   >> preserveOriginal="false" />
> >>  
> >>   >> generateNumberParts="1" splitOnCaseChange="0"
> preserveOriginal="0"
> >> splitOnNumerics="0" stemEnglishPossessive="1"
> >> generateWordParts="1"
> >> catenateNumbers="0" catenateWords="1" catenateAll="1" />
> >>  
> >>   
> >>   
> >>   >> pattern="[- /()]+" />
> >>   >> ignoreCase="true" />
> >>   >> preserveOriginal="false" />
> >>  
> >>   >> generateNumberParts="1" splitOnCaseChange="0"
> preserveOriginal="0"
> >> splitOnNumerics="0" stemEnglishPossessive="1"
> >> generateWordParts="1"
> >> catenateNumbers="0" catenateWords="0" catenateAll="0" />
> >>   
> >> 
> >>  End of Analyzers 
> >>
> >> ---- Stopwords 
> >> We use the following stopwords:
> >> a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no,
> not,
> >> of, on, or, such, that, the, their, then, there, these, they, this, to,
> >> was, will, with, which
> >>  End of Stopwords 

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Walter Underwood
By far the simplest solution is to leave stopwords in the index. That also 
improves
relevance, because it becomes possible to search for “vitamin a” or “to be or 
not to be”.

Stopword remove was a performance and disk space hack from the 1960s. It is no 
longer needed. We were keeping stopwords in the index at Infoseek, back in 1996.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 10, 2020, at 1:16 AM, Edward Turner  wrote:
> 
> Hi all,
> 
> Okay, I've been doing more research about this problem and from what I
> understand, phrase queries + stopwords are known to have some difficulties
> working together in some circumstances.
> 
> E.g.,
> https://stackoverflow.com/questions/56802656/stopwords-and-phrase-queries-solr?rq=1
> https://issues.apache.org/jira/browse/SOLR-6468
> 
> I was thinking about workarounds, but each solution I've attempted doesn't
> quite work.
> 
> Therefore, maybe one possible solution is to take a step back and
> preprocess index/query data going to Solr, something like:
> 
> String wordsForSolr = removeStopWordsFrom("This is pretend index or query
> data")
> // wordsForSolr = "pretend index query data"
> 
> Off the top of my head, this will by-pass position issues.
> 
> I will give this a go, but was wondering whether this is something others
> have done?
> 
> Best wishes,
> Edd
> 
> 
> Edward Turner
> 
> 
> On Fri, 6 Nov 2020 at 13:58, Edward Turner  wrote:
> 
>> Hi all,
>> 
>> We are experiencing some unexpected behaviour for phrase queries which we
>> believe might be related to the FlattenGraphFilterFactory and stopwords.
>> 
>> Brief description: when performing a phrase query
>> "Molecular cloning and evolution of the" => we get expected hits
>> "Molecular cloning and evolution of the genes" => we get no hits
>> (unexpected behaviour)
>> 
>> I think it's worthwhile adding the analyzers we use to help you see what
>> we're doing:
>>  Analyzers 
>> >   sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
>>   
>>  > pattern="[- /()]+" />
>>  > ignoreCase="true" />
>>  > preserveOriginal="false" />
>>  
>>  > generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>> splitOnNumerics="0" stemEnglishPossessive="1"
>> generateWordParts="1"
>> catenateNumbers="0" catenateWords="1" catenateAll="1" />
>>  
>>   
>>   
>>  > pattern="[- /()]+" />
>>  > ignoreCase="true" />
>>  > preserveOriginal="false" />
>>  
>>  > generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>> splitOnNumerics="0" stemEnglishPossessive="1"
>> generateWordParts="1"
>> catenateNumbers="0" catenateWords="0" catenateAll="0" />
>>   
>> 
>>  End of Analyzers 
>> 
>>  Stopwords 
>> We use the following stopwords:
>> a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not,
>> of, on, or, such, that, the, their, then, there, these, they, this, to,
>> was, will, with, which
>>  End of Stopwords 
>> 
>>  Analysis Admin page output ---
>> ... And to see what's going on when we're indexing/querying, I created a
>> gist with an image of the (non-verbose) output of the analysis admin page
>> for, index data/query, "Molecular cloning and evolution of the genes":
>> 
>> https://gist.github.com/eddturner/81dbf409703aad402e9009b13d42e43c#file-analysis-admin-png
>> 
>> Hopefully this link works, and you can see that the resulting terms and
>> positions are identical until the FlattenGraphFilterFactory step in the
>> "index" phase.
>> 
>> Final stage of index analysis:
>> (1)molecular (2)cloning (3) (4)evolution (5) (6)genes
>> 
>> Final stage of query analysis:
>> (1)molecular (2)cloning (3) (4)evolution (5) (6) (7)genes
>> 
>> The empty positions are because of stopwords (presumably)
>>  End of Analysis Admin page output ---
>> 
>> Main question:
>> Could someone explain why the FlattenGraphFilterFactory changes the
>> position of the "genes" token? From what we see, this happens after a,
>> "the" (but we've not checked exhaustively, and continue to test).
>> 
>> Perhaps, we are doing something wrong in our analysis setup?
>> 
>> Any help would be much appreciated -- getting phrase queries to work is an
>> important use-case of ours.
>> 
>> Kind regards and thank you in advance,
>> Edd
>> 
>> Edward Turner
>> 



Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Edward Turner
Hi all,

Okay, I've been doing more research about this problem and from what I
understand, phrase queries + stopwords are known to have some difficulties
working together in some circumstances.

E.g.,
https://stackoverflow.com/questions/56802656/stopwords-and-phrase-queries-solr?rq=1
https://issues.apache.org/jira/browse/SOLR-6468

I was thinking about workarounds, but each solution I've attempted doesn't
quite work.

Therefore, maybe one possible solution is to take a step back and
preprocess index/query data going to Solr, something like:

String wordsForSolr = removeStopWordsFrom("This is pretend index or query
data")
// wordsForSolr = "pretend index query data"

Off the top of my head, this will by-pass position issues.

I will give this a go, but was wondering whether this is something others
have done?

Best wishes,
Edd


Edward Turner


On Fri, 6 Nov 2020 at 13:58, Edward Turner  wrote:

> Hi all,
>
> We are experiencing some unexpected behaviour for phrase queries which we
> believe might be related to the FlattenGraphFilterFactory and stopwords.
>
> Brief description: when performing a phrase query
> "Molecular cloning and evolution of the" => we get expected hits
> "Molecular cloning and evolution of the genes" => we get no hits
> (unexpected behaviour)
>
> I think it's worthwhile adding the analyzers we use to help you see what
> we're doing:
>  Analyzers 
> sortMissingLast="true" omitNorms="true" positionIncrementGap="100">
>
> pattern="[- /()]+" />
> ignoreCase="true" />
> preserveOriginal="false" />
>   
> generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>  splitOnNumerics="0" stemEnglishPossessive="1"
> generateWordParts="1"
>  catenateNumbers="0" catenateWords="1" catenateAll="1" />
>   
>
>
> pattern="[- /()]+" />
> ignoreCase="true" />
> preserveOriginal="false" />
>   
> generateNumberParts="1" splitOnCaseChange="0" preserveOriginal="0"
>  splitOnNumerics="0" stemEnglishPossessive="1"
> generateWordParts="1"
>  catenateNumbers="0" catenateWords="0" catenateAll="0" />
>
> 
>  End of Analyzers 
>
>  Stopwords 
> We use the following stopwords:
> a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not,
> of, on, or, such, that, the, their, then, there, these, they, this, to,
> was, will, with, which
>  End of Stopwords 
>
>  Analysis Admin page output ---
> ... And to see what's going on when we're indexing/querying, I created a
> gist with an image of the (non-verbose) output of the analysis admin page
> for, index data/query, "Molecular cloning and evolution of the genes":
>
> https://gist.github.com/eddturner/81dbf409703aad402e9009b13d42e43c#file-analysis-admin-png
>
> Hopefully this link works, and you can see that the resulting terms and
> positions are identical until the FlattenGraphFilterFactory step in the
> "index" phase.
>
> Final stage of index analysis:
> (1)molecular (2)cloning (3) (4)evolution (5) (6)genes
>
> Final stage of query analysis:
> (1)molecular (2)cloning (3) (4)evolution (5) (6) (7)genes
>
> The empty positions are because of stopwords (presumably)
>  End of Analysis Admin page output ---
>
> Main question:
> Could someone explain why the FlattenGraphFilterFactory changes the
> position of the "genes" token? From what we see, this happens after a,
> "the" (but we've not checked exhaustively, and continue to test).
>
> Perhaps, we are doing something wrong in our analysis setup?
>
> Any help would be much appreciated -- getting phrase queries to work is an
> important use-case of ours.
>
> Kind regards and thank you in advance,
> Edd
> 
> Edward Turner
>


Indicating missing query terms in response

2020-11-08 Thread adfel70
As Solr query result set may contain documents that does not include all
search terms, we were wondering if it is possible to get indication what
terms were missing as part of the response.

For example, if our index has the following indexed doc:

{ 
"title": "hello"
}

(assuming '/title/' is textGeneral field)

Following query *q=hello world=title=edismax=1* will retrieve
the doc even though the search term '/world/' is missing. Is there a built
in capability to indicate the user, so she could refine the query afterward,
say like /+world/?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-06 Thread Edward Turner
Hi all,

We are experiencing some unexpected behaviour for phrase queries which we
believe might be related to the FlattenGraphFilterFactory and stopwords.

Brief description: when performing a phrase query
"Molecular cloning and evolution of the" => we get expected hits
"Molecular cloning and evolution of the genes" => we get no hits
(unexpected behaviour)

I think it's worthwhile adding the analyzers we use to help you see what
we're doing:
 Analyzers 

   
  
  
  
  
  
  
   
   
  
  
  
  
  
   

 End of Analyzers 

 Stopwords 
We use the following stopwords:
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not,
of, on, or, such, that, the, their, then, there, these, they, this, to,
was, will, with, which
 End of Stopwords 

 Analysis Admin page output ---
... And to see what's going on when we're indexing/querying, I created a
gist with an image of the (non-verbose) output of the analysis admin page
for, index data/query, "Molecular cloning and evolution of the genes":
https://gist.github.com/eddturner/81dbf409703aad402e9009b13d42e43c#file-analysis-admin-png

Hopefully this link works, and you can see that the resulting terms and
positions are identical until the FlattenGraphFilterFactory step in the
"index" phase.

Final stage of index analysis:
(1)molecular (2)cloning (3) (4)evolution (5) (6)genes

Final stage of query analysis:
(1)molecular (2)cloning (3) (4)evolution (5) (6) (7)genes

The empty positions are because of stopwords (presumably)
 End of Analysis Admin page output ---

Main question:
Could someone explain why the FlattenGraphFilterFactory changes the
position of the "genes" token? From what we see, this happens after a,
"the" (but we've not checked exhaustively, and continue to test).

Perhaps, we are doing something wrong in our analysis setup?

Any help would be much appreciated -- getting phrase queries to work is an
important use-case of ours.

Kind regards and thank you in advance,
Edd

Edward Turner


Re: Simulate facet.exists for json query facets

2020-10-30 Thread Michael Gibney
>If all of those facet queries are _known_ to be a performance hit,
you might be able to do something custom.That would require
custom code though and I wouldn’t go there unless you can
demonstrate need.

Yeah ... indeed if those facet queries are relatively static (and thus
cacheable ... even if there are a lot of them), an appropriately-sized
filterCache would allow them to be cached to good effect and then the
performance hit should be negligible. Knowing what the queries are up
front, you could even add them to your warming queries.

It'd also be unusual (though possible, sure?) to run these kinds of
facet queries with no intention of ever conditionally following up in
a way that would want the actual results/docSet -- even if the
initial/more common query only cares about boolean existence.

The case in which this type of functionality really might be indicated is:
1. only care about boolean result (obvious, ok)
2. dynamic (i.e., not-particularly-cacheable) queries
3. never intend to follow up with a request that calls for full results

If both of the first two conditions hold, and especially if the third
also holds, there would in principle definitely be efficiency to be
gained by early termination (and avoiding the creation of a DocSet,
which at the moment happens unconditionally for every facet query).
I'm also thinking about this through the lens of bringing the JSON
Facet API to parity with the legacy facet API, fwiw ...

On Fri, Oct 30, 2020 at 9:02 AM Erick Erickson  wrote:
>
> I don’t think there’s anything to do what you’re asking OOB.
>
> If all of those facet queries are _known_ to be a performance hit,
> you might be able to do something custom.That would require
> custom code though and I wouldn’t go there unless you can
> demonstrate need.
>
> If you issue a debug=timing you’ll see the time each component
> takes,  and there’s a separate entry for faceting so that’ll give you
> a clue whether it’s worth the effort.
>
> Best,
> Erick
>
> > On Oct 30, 2020, at 8:10 AM, Michael Gibney  
> > wrote:
> >
> > Michael, sorry for the confusion; I was positing a *hypothetical*
> > "exists()" function that doesn't currently exist, that *is* an
> > aggregate function, and the *does* stop early. I didn't account for
> > the fact that there's already an "exists()" function *query* that
> > behaves very differently. So yes, definitely confusing :-). I guess
> > choosing a different name for the proposed aggregate function would
> > make sense. I was suggesting it mostly as an alternative to extending
> > the syntax of JSON Facet "query" facet type, and to say that I think
> > the implementation of such an aggregate function would be pretty
> > straightforward.
> >
> > On Fri, Oct 30, 2020 at 3:44 AM michael dürr  wrote:
> >>
> >> @Erick
> >>
> >> Sorry! I chose a simple example as I wanted to reduce complexity.
> >> In detail:
> >> * We have distinct contents like tours, offers, events, etc which
> >> themselves may be categorized: A tour may be a hiking tour, a
> >> mountaineering tour, ...
> >> * We have hundreds of customers that want to facet their searches to that
> >> content types but often with distinct combinations of categories, i.e.
> >> customer A wants his facet "tours" to only count hiking tours, customer B
> >> only mountaineering tours, customer C a combination of both, etc
> >> * We use "query" facets as each facet request will be build dynamically (it
> >> is not feasible to aggregate certain categories and add them as an
> >> additional solr schema field as we have hundreds of different 
> >> combinations).
> >> * Anyways, our ui only requires adding a toggle to filter for (for example)
> >> "tours" in case a facet result is present. We do not care about the number
> >> of tours.
> >> * As we have millions of contents and dozens of content types (and dozens
> >> of categories per content type) such queries may take a very long time.
> >>
> >> A complex example may look like this:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>

Re: Simulate facet.exists for json query facets

2020-10-30 Thread Erick Erickson
I don’t think there’s anything to do what you’re asking OOB.

If all of those facet queries are _known_ to be a performance hit,
you might be able to do something custom.That would require 
custom code though and I wouldn’t go there unless you can
demonstrate need.

If you issue a debug=timing you’ll see the time each component 
takes,  and there’s a separate entry for faceting so that’ll give you
a clue whether it’s worth the effort.

Best,
Erick

> On Oct 30, 2020, at 8:10 AM, Michael Gibney  wrote:
> 
> Michael, sorry for the confusion; I was positing a *hypothetical*
> "exists()" function that doesn't currently exist, that *is* an
> aggregate function, and the *does* stop early. I didn't account for
> the fact that there's already an "exists()" function *query* that
> behaves very differently. So yes, definitely confusing :-). I guess
> choosing a different name for the proposed aggregate function would
> make sense. I was suggesting it mostly as an alternative to extending
> the syntax of JSON Facet "query" facet type, and to say that I think
> the implementation of such an aggregate function would be pretty
> straightforward.
> 
> On Fri, Oct 30, 2020 at 3:44 AM michael dürr  wrote:
>> 
>> @Erick
>> 
>> Sorry! I chose a simple example as I wanted to reduce complexity.
>> In detail:
>> * We have distinct contents like tours, offers, events, etc which
>> themselves may be categorized: A tour may be a hiking tour, a
>> mountaineering tour, ...
>> * We have hundreds of customers that want to facet their searches to that
>> content types but often with distinct combinations of categories, i.e.
>> customer A wants his facet "tours" to only count hiking tours, customer B
>> only mountaineering tours, customer C a combination of both, etc
>> * We use "query" facets as each facet request will be build dynamically (it
>> is not feasible to aggregate certain categories and add them as an
>> additional solr schema field as we have hundreds of different combinations).
>> * Anyways, our ui only requires adding a toggle to filter for (for example)
>> "tours" in case a facet result is present. We do not care about the number
>> of tours.
>> * As we have millions of contents and dozens of content types (and dozens
>> of categories per content type) such queries may take a very long time.
>> 
>> A complex example may look like this:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
>> 21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
>> 21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
>> query, q: \"+categoryId:21515\"   },   condition:{ type : query,
>> q: \"+categoryId:21514\"   },   hut:{ type : query, q:
>> \"+categoryId:8510\"   },   skiresort:{ type : query, q:
>> \"+categoryId:21493\"   },   offer:{ type : query, q:
>> \"+categoryId:21462\"   },   lodging:{ type : query, q:
>> \"+categoryId:6061\"   },   event:{ type : query, q:
>> \"+categoryId:21465\"   },   poi:{ type : query, q:
>> \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"   },   authors:{
>> type : query, q: \"+categoryId:(21205 21206)\"   },   partners:{
>> type : query, q: \"+categoryId:21200\"   },   list:{ type :
>> query, q: \"+categoryId:21481\"   } }\=0"*
>> 
>> @Michael
>> 
>> Thanks for your suggestion but this does not work as
>> * the facet module expects an aggregate function (which i simply added by
>> embracing your call with sum(...))
>> * and (please correct me if I am wrong) the exists() function not stops on
>> the first match, but counts the number of results for which the query
>> matches a document.



Re: Simulate facet.exists for json query facets

2020-10-30 Thread Michael Gibney
Michael, sorry for the confusion; I was positing a *hypothetical*
"exists()" function that doesn't currently exist, that *is* an
aggregate function, and the *does* stop early. I didn't account for
the fact that there's already an "exists()" function *query* that
behaves very differently. So yes, definitely confusing :-). I guess
choosing a different name for the proposed aggregate function would
make sense. I was suggesting it mostly as an alternative to extending
the syntax of JSON Facet "query" facet type, and to say that I think
the implementation of such an aggregate function would be pretty
straightforward.

On Fri, Oct 30, 2020 at 3:44 AM michael dürr  wrote:
>
> @Erick
>
> Sorry! I chose a simple example as I wanted to reduce complexity.
> In detail:
> * We have distinct contents like tours, offers, events, etc which
> themselves may be categorized: A tour may be a hiking tour, a
> mountaineering tour, ...
> * We have hundreds of customers that want to facet their searches to that
> content types but often with distinct combinations of categories, i.e.
> customer A wants his facet "tours" to only count hiking tours, customer B
> only mountaineering tours, customer C a combination of both, etc
> * We use "query" facets as each facet request will be build dynamically (it
> is not feasible to aggregate certain categories and add them as an
> additional solr schema field as we have hundreds of different combinations).
> * Anyways, our ui only requires adding a toggle to filter for (for example)
> "tours" in case a facet result is present. We do not care about the number
> of tours.
> * As we have millions of contents and dozens of content types (and dozens
> of categories per content type) such queries may take a very long time.
>
> A complex example may look like this:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
> 21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
> 21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
> query, q: \"+categoryId:21515\"   },   condition:{     type : query,
>  q: \"+categoryId:21514\"   },   hut:{ type : query, q:
> \"+categoryId:8510\"   },   skiresort:{ type : query, q:
> \"+categoryId:21493\"   },   offer:{ type : query, q:
> \"+categoryId:21462\"   },   lodging:{ type : query, q:
> \"+categoryId:6061\"   },   event:{ type : query, q:
> \"+categoryId:21465\"   },   poi:{ type : query, q:
> \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"   },   authors:{
>  type : query, q: \"+categoryId:(21205 21206)\"   },   partners:{
>  type : query, q: \"+categoryId:21200\"   },   list:{ type :
> query, q: \"+categoryId:21481\"   } }\=0"*
>
> @Michael
>
> Thanks for your suggestion but this does not work as
> * the facet module expects an aggregate function (which i simply added by
> embracing your call with sum(...))
> * and (please correct me if I am wrong) the exists() function not stops on
> the first match, but counts the number of results for which the query
> matches a document.


Re: Simulate facet.exists for json query facets

2020-10-30 Thread michael dürr
@Erick

Sorry! I chose a simple example as I wanted to reduce complexity.
In detail:
* We have distinct contents like tours, offers, events, etc which
themselves may be categorized: A tour may be a hiking tour, a
mountaineering tour, ...
* We have hundreds of customers that want to facet their searches to that
content types but often with distinct combinations of categories, i.e.
customer A wants his facet "tours" to only count hiking tours, customer B
only mountaineering tours, customer C a combination of both, etc
* We use "query" facets as each facet request will be build dynamically (it
is not feasible to aggregate certain categories and add them as an
additional solr schema field as we have hundreds of different combinations).
* Anyways, our ui only requires adding a toggle to filter for (for example)
"tours" in case a facet result is present. We do not care about the number
of tours.
* As we have millions of contents and dozens of content types (and dozens
of categories per content type) such queries may take a very long time.

A complex example may look like this:























































*q=*:*={   tour:{ type : query, q: \"+categoryId:(21450
21453)\"   },   guide:{ type : query, q: \"+categoryId:(21105 21401
21301 21302 21303 21304 21305 21403 21404)\"   },   story:{ type :
query, q: \"+categoryId:21515\"   },   condition:{ type : query,
 q: \"+categoryId:21514\"   },   hut:{ type : query, q:
\"+categoryId:8510\"   },   skiresort:{ type : query, q:
\"+categoryId:21493\"   },   offer:{ type : query, q:
\"+categoryId:21462\"   },   lodging:{ type : query, q:
\"+categoryId:6061\"   },   event:{ type : query, q:
\"+categoryId:21465\"   },   poi:{ type : query, q:
\"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"   },   authors:{
 type : query, q: \"+categoryId:(21205 21206)\"   },   partners:{
 type : query, q: \"+categoryId:21200\"   },   list:{ type :
query, q: \"+categoryId:21481\"   } }\=0"*

@Michael

Thanks for your suggestion but this does not work as
* the facet module expects an aggregate function (which i simply added by
embracing your call with sum(...))
* and (please correct me if I am wrong) the exists() function not stops on
the first match, but counts the number of results for which the query
matches a document.


Re: SolrJ NestableJsonFacet ordering of query facet

2020-10-29 Thread Shivam Jha
Hi folks,

Does anyone have any advice on this issue?

Thanks,
Shivam

On Tue, Oct 27, 2020 at 1:20 PM Shivam Jha  wrote:

> Hi folks,
>
> Doing some faceted queries using 'facet.json' param and SolrJ, the results
> of which I am processing using SolrJ NestableJsonFacet class.
> basically as   *queryResponse.getJsonFacetingResponse() -> returns 
> *NestableJsonFacet
> object.
>
> But I have noticed it does not maintain the facet-query order in which it
> was given in *facet.json.*
> *Direct queries to solr do maintain that order, but not after it comes to
> Java layer in SolrJ.*
>
> Is there a way to make it maintain that order ?
> Hopefully the question makes sense, if not please let me know I can
> clarify further.
>
> Thanks,
> Shivam
>


-- 
shivamJha


Re: Simulate facet.exists for json query facets

2020-10-28 Thread Michael Gibney
Separately, and in parallel to Erick's question: indeed I'm not aware
of any way to do this currently, but I *can* imagine cases where this
would be useful. I have a sense this could be cleanly implemented as a
stat facet function
(https://lucene.apache.org/solr/guide/8_6/json-facet-api.html#stat-facet-functions),
e.g.:

curl http://localhost:8983/solr/portal/select -d \
"q=*:*\
={
  tour: \"exists(+categoryId:6000 -categoryId:(6061 21493 8510))\"
}\
=0"

The return value of the `exists` function could be boolean, which
would be semantically clearer than capping count to 1, as I gather
`facet.exists` does. For the same reason, implementing this as a
function would probably be better than adding this functionality to
the `query` facet type, which carries certain useful assumptions (the
meaning of the "count" attribute in the response, the ability to nest
stats and subfacets, etc.) ... just thinking out loud at the moment
...

On Wed, Oct 28, 2020 at 9:17 AM Erick Erickson  wrote:
>
> This really sounds like an XY problem. The whole point of facets is
> to count the number of documents that have a value in some
> number of buckets. So trying to stop your facet query as soon
> as it matches a hit for the first time seems like an odd thing to do.
>
> So what’s the “X”? In other words, what is the problem you’re trying
> to solve at a high level? Perhaps there’s a better way to figure this
> out.
>
> Best,
> Erick
>
> > On Oct 28, 2020, at 3:48 AM, michael dürr  wrote:
> >
> > Hi,
> >
> > I use json facets of type 'query'. As these queries are pretty slow and I'm
> > only interested in whether there is a match or not, I'd like to restrict
> > the query execution similar to the standard facetting (like with the
> > facet.exists parameter). My simplified query looks something like this (in
> > reality *:* may be replaced by a complex edismax query and multiple
> > subfacets similar to "tour" occur):
> >
> > curl http://localhost:8983/solr/portal/select -d \
> > "q=*:*\
> > ={
> >  tour:{
> >type : query,
> > q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"
> >  }
> > }\
> > =0"
> >
> > Is there any possibility to modify my request to ensure that the facet
> > query stops as soon as it matches a hit for the first time?
> >
> > Thanks!
> > Michael
>


Re: Simulate facet.exists for json query facets

2020-10-28 Thread Erick Erickson
This really sounds like an XY problem. The whole point of facets is
to count the number of documents that have a value in some
number of buckets. So trying to stop your facet query as soon
as it matches a hit for the first time seems like an odd thing to do.

So what’s the “X”? In other words, what is the problem you’re trying
to solve at a high level? Perhaps there’s a better way to figure this
out.

Best,
Erick

> On Oct 28, 2020, at 3:48 AM, michael dürr  wrote:
> 
> Hi,
> 
> I use json facets of type 'query'. As these queries are pretty slow and I'm
> only interested in whether there is a match or not, I'd like to restrict
> the query execution similar to the standard facetting (like with the
> facet.exists parameter). My simplified query looks something like this (in
> reality *:* may be replaced by a complex edismax query and multiple
> subfacets similar to "tour" occur):
> 
> curl http://localhost:8983/solr/portal/select -d \
> "q=*:*\
> ={
>  tour:{
>type : query,
> q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"
>  }
> }\
> =0"
> 
> Is there any possibility to modify my request to ensure that the facet
> query stops as soon as it matches a hit for the first time?
> 
> Thanks!
> Michael



Simulate facet.exists for json query facets

2020-10-28 Thread michael dürr
Hi,

I use json facets of type 'query'. As these queries are pretty slow and I'm
only interested in whether there is a match or not, I'd like to restrict
the query execution similar to the standard facetting (like with the
facet.exists parameter). My simplified query looks something like this (in
reality *:* may be replaced by a complex edismax query and multiple
subfacets similar to "tour" occur):

curl http://localhost:8983/solr/portal/select -d \
"q=*:*\
={
  tour:{
    type : query,
 q: \"+(+categoryId:6000 -categoryId:(6061 21493 8510))\"
  }
}\
=0"

Is there any possibility to modify my request to ensure that the facet
query stops as soon as it matches a hit for the first time?

Thanks!
Michael


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Thanks!

Mark

On Tue, Oct 27, 2020 at 11:56 AM Dave  wrote:

> Agreed. Just a JavaScript check on the input box would work fine for 99%
> of cases, unless something automatic is running them in which case just
> server side redirect back to the form.
>
> > On Oct 27, 2020, at 11:54 AM, Mark Robinson 
> wrote:
> >
> > Hi  Konstantinos ,
> >
> > Thanks for the reply.
> > I too feel the same. Wanted to find what others also in the Solr world
> > thought about it.
> >
> > Thanks!
> > Mark.
> >
> >> On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
> >> konstantinos.koukou...@mecenat.com> wrote:
> >>
> >> Oh hi Mark!
> >>
> >> Why would you wanna do such a thing in the solr end. Imho it would be
> much
> >> more clean and easy to do it on the client side
> >>
> >> Regards,
> >> Konstantinos
> >>
> >>
> >>>> On 27 Oct 2020, at 16:42, Mark Robinson 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I want to block queries having only a digit like "1" or "2" ,... or
> >>> just a letter like "a" or "b" ...
> >>>
> >>> Is it a good idea to block them ... ie just single digits 0 - 9 and  a
> -
> >> z
> >>> by putting them as a stop word? The problem with this I can anticipate
> >> is a
> >>> query like "1 inch screw" can have the important information "1"
> stripped
> >>> out if I tokenize it.
> >>>
> >>> So what would be a good way to avoid  single digit only and single
> letter
> >>> only queries, from the Solr end?
> >>> Or should I not do this at the Solr end at all?
> >>>
> >>> Could someone please share your thoughts?
> >>>
> >>> Thanks!
> >>> Mark
> >>
> >> ==
> >> Konstantinos Koukouvis
> >> konstantinos.koukou...@mecenat.com
> >>
> >> Using Golang and Solr? Try this: https://github.com/mecenat/solr
> >>
> >>
> >>
> >>
> >>
> >>
>


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Dave
Agreed. Just a JavaScript check on the input box would work fine for 99% of 
cases, unless something automatic is running them in which case just server 
side redirect back to the form. 

> On Oct 27, 2020, at 11:54 AM, Mark Robinson  wrote:
> 
> Hi  Konstantinos ,
> 
> Thanks for the reply.
> I too feel the same. Wanted to find what others also in the Solr world
> thought about it.
> 
> Thanks!
> Mark.
> 
>> On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
>> konstantinos.koukou...@mecenat.com> wrote:
>> 
>> Oh hi Mark!
>> 
>> Why would you wanna do such a thing in the solr end. Imho it would be much
>> more clean and easy to do it on the client side
>> 
>> Regards,
>> Konstantinos
>> 
>> 
>>>> On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
>>> 
>>> Hello,
>>> 
>>> I want to block queries having only a digit like "1" or "2" ,... or
>>> just a letter like "a" or "b" ...
>>> 
>>> Is it a good idea to block them ... ie just single digits 0 - 9 and  a -
>> z
>>> by putting them as a stop word? The problem with this I can anticipate
>> is a
>>> query like "1 inch screw" can have the important information "1" stripped
>>> out if I tokenize it.
>>> 
>>> So what would be a good way to avoid  single digit only and single letter
>>> only queries, from the Solr end?
>>> Or should I not do this at the Solr end at all?
>>> 
>>> Could someone please share your thoughts?
>>> 
>>> Thanks!
>>> Mark
>> 
>> ==
>> Konstantinos Koukouvis
>> konstantinos.koukou...@mecenat.com
>> 
>> Using Golang and Solr? Try this: https://github.com/mecenat/solr
>> 
>> 
>> 
>> 
>> 
>> 


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Hi  Konstantinos ,

Thanks for the reply.
I too feel the same. Wanted to find what others also in the Solr world
thought about it.

Thanks!
Mark.

On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
konstantinos.koukou...@mecenat.com> wrote:

> Oh hi Mark!
>
> Why would you wanna do such a thing in the solr end. Imho it would be much
> more clean and easy to do it on the client side
>
> Regards,
> Konstantinos
>
>
> > On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
> >
> > Hello,
> >
> > I want to block queries having only a digit like "1" or "2" ,... or
> > just a letter like "a" or "b" ...
> >
> > Is it a good idea to block them ... ie just single digits 0 - 9 and  a -
> z
> > by putting them as a stop word? The problem with this I can anticipate
> is a
> > query like "1 inch screw" can have the important information "1" stripped
> > out if I tokenize it.
> >
> > So what would be a good way to avoid  single digit only and single letter
> > only queries, from the Solr end?
> > Or should I not do this at the Solr end at all?
> >
> > Could someone please share your thoughts?
> >
> > Thanks!
> > Mark
>
> ==
> Konstantinos Koukouvis
> konstantinos.koukou...@mecenat.com
>
> Using Golang and Solr? Try this: https://github.com/mecenat/solr
>
>
>
>
>
>


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Konstantinos Koukouvis
Oh hi Mark!

Why would you wanna do such a thing in the solr end. Imho it would be much more 
clean and easy to do it on the client side

Regards,
Konstantinos


> On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
> 
> Hello,
> 
> I want to block queries having only a digit like "1" or "2" ,... or
> just a letter like "a" or "b" ...
> 
> Is it a good idea to block them ... ie just single digits 0 - 9 and  a - z
> by putting them as a stop word? The problem with this I can anticipate is a
> query like "1 inch screw" can have the important information "1" stripped
> out if I tokenize it.
> 
> So what would be a good way to avoid  single digit only and single letter
> only queries, from the Solr end?
> Or should I not do this at the Solr end at all?
> 
> Could someone please share your thoughts?
> 
> Thanks!
> Mark

==
Konstantinos Koukouvis
konstantinos.koukou...@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr







Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Hello,

I want to block queries having only a digit like "1" or "2" ,... or
just a letter like "a" or "b" ...

Is it a good idea to block them ... ie just single digits 0 - 9 and  a - z
by putting them as a stop word? The problem with this I can anticipate is a
query like "1 inch screw" can have the important information "1" stripped
out if I tokenize it.

So what would be a good way to avoid  single digit only and single letter
only queries, from the Solr end?
Or should I not do this at the Solr end at all?

Could someone please share your thoughts?

Thanks!
Mark


SolrJ NestableJsonFacet ordering of query facet

2020-10-27 Thread Shivam Jha
Hi folks,

Doing some faceted queries using 'facet.json' param and SolrJ, the results
of which I am processing using SolrJ NestableJsonFacet class.
basically as   *queryResponse.getJsonFacetingResponse() -> returns
*NestableJsonFacet
object.

But I have noticed it does not maintain the facet-query order in which it
was given in *facet.json.*
*Direct queries to solr do maintain that order, but not after it comes to
Java layer in SolrJ.*

Is there a way to make it maintain that order ?
Hopefully the question makes sense, if not please let me know I can clarify
further.

Thanks,
Shivam


ElevateIds - should I remove those that might be filtered off in the underlying query

2020-10-19 Thread Mark Robinson
Hi,

Suppose I have say 50 ElevateIds and I have a way to identify those that
would get filtered out in the query by predefined  fqs. So they would in
reality never be even in the results and hence never be elevated.

Is there any advantage if I avoid passing them in the elevateIds at the
time of creating the elevateIds,  thinking I can gain performance or they
remaining in the elevateIds does not cause any performance difference?

Thanks!
Mark


Solr 8.6.2 Facets query for Nested documents

2020-10-13 Thread Abhay Kumar
e":"Principal Investigator",
 "investigatorsaffiliation":"The Royal Women''s Hospital, Melbourne 
Australia",
 "CongressScore":"",
 "TrialsScore":"Low",
 "PublicationScore":"",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
 "therapeuticareaname":"",
 "text_suggest":[
""
 ],
 "sponsorname":[
    ""
 ],
 "status":"",
 "_version_":1680437253090836480
  }
   ],
   "therapeuticareas":[
  {
 "id":"ta-0-NCT04372953",
 "therapeuticareaname":"Premature Birth",
 "text_prefixauto":"Premature Birth",
 "text_suggest":[
"Premature Birth"
 ],
 "diseaseareas":[
""
 ],
 "nodetype":"cnode",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "investigatorsaffiliation":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
 "investigatorname":[
""
 ],
 "investigatorrole":"",
 "sponsorname":[
""
 ],
 "status":"",
 "_version_":1680437253090836480,
 "therapeuticareaname_facet":"Premature Birth",
 "diseaseareas_facet":[
""
 ]
  },
  {
 "id":"ta-1-NCT04372953",
 "therapeuticareaname":"Lung Injury",
 "text_prefixauto":"Lung Injury",
 "text_suggest":[
"Lung Injury"
 ],
 "diseaseareas":[
"Respiratory tract diseases"
 ],
 "nodetype":"cnode",
 "_nest_parent_":"NCT04372953",
 "phase":"",
 "studytype":"",
 "investigatorsaffiliation":"",
 "source":"",
 "title":"",
 "sponsorrole":[
""
 ],
 "investigatorname":[
""
 ],
 "investigatorrole":"",
 "sponsorname":[
""
 ],
 "status":"",
 "_version_":1680437253090836480,
 "therapeuticareaname_facet":"Lung Injury",
 "diseaseareas_facet":[
"Respiratory tract diseases"
 ]
  }
   ]
}


Now, I am trying to query based on "therapeuticareaname" field. And I need 
following details.


  1.  I need all the parent document, following query is working for me.

http://localhost:8983/solr/ClinicalTrial2/select?q={!parent which='-nodetype:* 
*:*'}therapeuticareaname:lung

  1.  Now, I want the facet for the fields : therapeuticareaname, diseaseareas 
and facilityname fields.
I am using below query to get the facets :

http://localhost:8983/solr/ClinicalTrial2/select?q=(therapeuticareaname:lung)={
 diseaseareas : { type: terms, field: diseaseareas_facet, facet: { 
diseaseareaCount: "unique(_root_)" } }, therapeuticareas : { type: terms, 
field: therapeuticareaname_facet, facet: { therapeuticareaCount: 
"unique(_root_)" } }, facilities : { type: terms, field: facilityname, facet: { 
facilityCount: "unique(_root_)" } }}

But the issue that I have is, I am getting facets for therapeuticareaname and 
diseaseareas but not for the field "facilityname".

Please help to form facet query to get the facets for the field "facilityname".

Here is my output. :

[cid:image003.jpg@01D6A19D.84C356E0]

Thanks.
Abhay

Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


'Exists' query not working for geospatial fields in Solr >= 8.5.0?

2020-10-08 Thread Ondra Horak
Hi,

I just found Solr queries like field:* are not working anymore for
fields of type SpatialRecursivePrefixTreeFieldType.  It seems to work
in 8.4.1, since 8.5.0 it just gives an empty result. Is this an
intended behaviour, or a bug?

Looking at Solr release notes I'd say it might be a consequence of a
bugfix introduced in Solr 8.5.0:
SOLR-11746: Adding existence queries for PointFields.
DocValuesFieldExistsQuery and NormsFieldExistsQuery are used for
existence queries when possible.
(Houston Putman, hossman, Kai Chan)

What would be the best way to replace this type of queries? I tried to
use a query like field:** as a workaround which works but is quite
inefficient. Another workaround is to search with a large distance to
match any possible point. This is pretty fast (in fact, with my data
it is even faster than field:* in 8.4.1) but it seems like an ugly
hack. Anyway, I would welcome a more transparent behaviour.


Regards,

Ondra Horak


RE: Slow Solr 8 response for long query

2020-10-05 Thread Permakoff, Vadim
Hi Erick,
Thank you for looking into my question.

Below is timing for Solr 6 and Solr 8. I see that the search time depends on 
grouping, without grouping it is very fast and approx. the same for both solr 6 
& 8, but with grouping the solr 8 is much slower. The difference grows with 
number of returned results (groups). For 30 results the difference is not that 
big, but for 300 results Solr 6 speed is almost the same, but Solr 8 is about 
10 times slower. The data is the same, the indexing done from scratch.
The documents are nested, we are searching children and grouping on a field, 
which may group children from different parents, but in this particular case 
groups are only from one parent.
This is the query example:
qt=/select=json=true=0=30=_text_sp_=VERY_LONG_BOOLEAN_QUERY_USING_SEVERAL_INDEXED_STRING_FIELDS_FROM_CHILDREN=OR=q=true=_nested_id:child=true=true=uniqueId=true=id,score=timing

Solr-8:
  "debug":{
"timing":{
  "time":22258.0,
  "prepare":{
"time":20.0,
"query":{
  "time":20.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
    "expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":22210.0,
"query":{
  "time":22210.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}

Solr-6:
  "debug":{
"timing":{
  "time":16157.0,
  "prepare":{
"time":14.0,
"query":{
  "time":14.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":16133.0,
"query":{
  "time":16133.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}

Best Regards,
Vadim Permakoff


-Original Message-
From: Erick Erickson  
Sent: Wednesday, September 30, 2020 8:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Slow Solr 8 response for long query

Caution: This email originated outside of the organization

Increasing the number of rows should not have this kind of impact in either 
version of Solr, so I think there’s something fundamentally strange in your 
setup.

Whether returning 10 or 300 documents, every document has to be scored. There 
are two differences between 10 and 300 rows:

1> when returning 10 rows, Solr keeps a sorted list of 10 doc, just IDs and 
score (assuming you’re sorting by relevance), when returning 300 the list is 
300 long. I find it hard to believe that keeping a list 300 items long is 
making that much of a difference.

2> Solr needs to fetch/decompress/assemble 300 documents .vs. 10 documents for 
the response. Regardless of the fields returned, the entire document will be 
decompresses if you return any fields that are not docValues=true. So it’s 
possible that what you’re seeing is related.

Try adding, as Alexandre suggests,  to the query. Pay particular 
attention to the “timings” section too, that’ll show you the time each 
component took _exclusive_ of step <2> above and should give a clue.


All that said, fq clauses don’t score, so scoring is certainly involved in why 
the 

Re: Solr 7.6 query performace question

2020-10-01 Thread raj.yadav
harjags wrote
> Below errors are very common in 7.6 and we have solr nodes failing with
> tanking memory.
> 
> The request took too long to iterate over terms. Timeout: timeoutAt:
> 162874656583645 (System.nanoTime(): 162874701942020),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@74507f4a
> 
> or 
> 
> #*BitSetDocTopFilter*]; The request took too long to iterate over terms.
> Timeout: timeoutAt: 33288640223586 (System.nanoTime(): 33288700895778),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@5e458644
> 
> 
> or 
> 
> #SortedIntDocSetTopFilter]; The request took too long to iterate over
> terms.
> Timeout: timeoutAt: 552497919389297 (System.nanoTime(): 552508251053558),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@60b7186e
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



We are also seeing such errors in our log. But our nodes are not failing
also the frequency of such warnings are less then 5% of overall traffic.
What does this error means.
Can someone eleaborate following :
1. What does `The request took too long to iterate over terms` means ? 
2. what is `BitSetDocTopFilter` and `SortedIntDocSetTopFilter` ?

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Slow Solr 8 response for long query

2020-09-30 Thread Erick Erickson
Increasing the number of rows should not have this kind of impact in either 
version of Solr, so I think there’s something fundamentally strange in your 
setup.

Whether returning 10 or 300 documents, every document has to be scored. There 
are two differences between 10 and 300 rows:

1> when returning 10 rows, Solr keeps a sorted list of 10 doc, just IDs and 
score (assuming you’re sorting by relevance), when returning 300 the list is 
300 long. I find it hard to believe that keeping a list 300 items long is 
making that much of a difference.

2> Solr needs to fetch/decompress/assemble 300 documents .vs. 10 documents for 
the response. Regardless of the fields returned, the entire document will be 
decompresses if you return any fields that are not docValues=true. So it’s 
possible that what you’re seeing is related.

Try adding, as Alexandre suggests,  to the query. Pay particular 
attention to the “timings” section too, that’ll show you the time each 
component took _exclusive_ of step <2> above and should give a clue.


All that said, fq clauses don’t score, so scoring is certainly involved in why 
the query takes so long to return even 10 rows but gets faster when you move 
the clause to a filter query, but my intuition is that there’s something else 
going on as well to account for the difference when you return 300 rows.

Best,
Erick

> On Sep 29, 2020, at 8:52 PM, Alexandre Rafalovitch  wrote:
> 
> What do the debug versions of the query show between two versions?
> 
> One thing that changed is sow (split on whitespace) parameter among
> many. It is unlikely to be the cause, but I am mentioning just in
> case.
> https://lucene.apache.org/solr/guide/8_6/the-standard-query-parser.html#standard-query-parser-parameters
> 
> Regards,
>   Alex
> 
> On Tue, 29 Sep 2020 at 20:47, Permakoff, Vadim
>  wrote:
>> 
>> Hi Solr Experts!
>> We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long 
>> query, which has a search text plus many OR and AND conditions (all in one 
>> place, the query is about 20KB long).
>> For the same set of data (about 500K docs) and the same schema the query in 
>> Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to 
>> get 10 results. If I increase the number of rows to 300, in Solr 6 it takes 
>> about 10 sec, in Solr 8 it takes more than 1 min. The results are small, 
>> just IDs. It looks like the relevancy scoring plays role, because if I move 
>> this query to filter query - both Solr versions work pretty fast.
>> The right way should be to change the query, but unfortunately it is 
>> difficult to modify the application which creates these queries, so I want 
>> to find some temporary workaround.
>> 
>> What was changed from Solr 6 to Solr 8 in terms of scoring with many 
>> conditions, which affects the search speed negatively?
>> Is there anything to configure in Solr 8 to get the same performance for 
>> such query like it was in Solr 6?
>> 
>> Thank you,
>> Vadim
>> 
>> 
>> 
>> This email is intended solely for the recipient. It may contain privileged, 
>> proprietary or confidential information or material. If you are not the 
>> intended recipient, please delete this email and any attachments and notify 
>> the sender of the error.



Re: Slow Solr 8 response for long query

2020-09-29 Thread Alexandre Rafalovitch
What do the debug versions of the query show between two versions?

One thing that changed is sow (split on whitespace) parameter among
many. It is unlikely to be the cause, but I am mentioning just in
case.
https://lucene.apache.org/solr/guide/8_6/the-standard-query-parser.html#standard-query-parser-parameters

Regards,
   Alex

On Tue, 29 Sep 2020 at 20:47, Permakoff, Vadim
 wrote:
>
> Hi Solr Experts!
> We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long 
> query, which has a search text plus many OR and AND conditions (all in one 
> place, the query is about 20KB long).
> For the same set of data (about 500K docs) and the same schema the query in 
> Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to 
> get 10 results. If I increase the number of rows to 300, in Solr 6 it takes 
> about 10 sec, in Solr 8 it takes more than 1 min. The results are small, just 
> IDs. It looks like the relevancy scoring plays role, because if I move this 
> query to filter query - both Solr versions work pretty fast.
> The right way should be to change the query, but unfortunately it is 
> difficult to modify the application which creates these queries, so I want to 
> find some temporary workaround.
>
> What was changed from Solr 6 to Solr 8 in terms of scoring with many 
> conditions, which affects the search speed negatively?
> Is there anything to configure in Solr 8 to get the same performance for such 
> query like it was in Solr 6?
>
> Thank you,
> Vadim
>
> 
>
> This email is intended solely for the recipient. It may contain privileged, 
> proprietary or confidential information or material. If you are not the 
> intended recipient, please delete this email and any attachments and notify 
> the sender of the error.


Slow Solr 8 response for long query

2020-09-29 Thread Permakoff, Vadim
Hi Solr Experts!
We are moving from Solr 6.5.1 to Solr 8.5.0 and having a problem with long 
query, which has a search text plus many OR and AND conditions (all in one 
place, the query is about 20KB long).
For the same set of data (about 500K docs) and the same schema the query in 
Solr 6 return results in less than 2 sec, Solr 8 takes more than 10 sec to get 
10 results. If I increase the number of rows to 300, in Solr 6 it takes about 
10 sec, in Solr 8 it takes more than 1 min. The results are small, just IDs. It 
looks like the relevancy scoring plays role, because if I move this query to 
filter query - both Solr versions work pretty fast.
The right way should be to change the query, but unfortunately it is difficult 
to modify the application which creates these queries, so I want to find some 
temporary workaround.

What was changed from Solr 6 to Solr 8 in terms of scoring with many 
conditions, which affects the search speed negatively?
Is there anything to configure in Solr 8 to get the same performance for such 
query like it was in Solr 6?

Thank you,
Vadim



This email is intended solely for the recipient. It may contain privileged, 
proprietary or confidential information or material. If you are not the 
intended recipient, please delete this email and any attachments and notify the 
sender of the error.


Re: How to use query function inside a function query in Solr LTR

2020-09-22 Thread krishan goyal
This is solved by using local parameters. So

{!func}sub(num_tokens_int,query({!dismax qf=field_name v=${text}}))

works


On Mon, Sep 21, 2020 at 7:43 PM krishan goyal  wrote:

> Hi,
>
> I have use cases of features which require a query function and some more
> math on top of the result of the query function
>
> Eg of a feature : no of extra terms in the document from input text
>
> I am trying various ways of representing this feature but always getting
> an exception
> java.lang.RuntimeException: Exception from createWeight for SolrFeature
> . Failed to parse feature query.
>
>  Feature representations
> "name" : "no_of_extra_terms",
> "class" : "org.apache.solr.ltr.feature.SolrFeature",
> "params": {
> "q": "{!func}sub(num_tokens_int,query({!dismax
> qf=field_name}${text}))"
> },
>
> where num_tokens_int is a stored field which contains no of tokens in the
> document
>
>
> Also, feature representation with just a query parser like
>
> "q": "{!dismax df=field_name}${text}"
>
> works but I can't really getting my desired feature representation without
> using it in a function query where i want to operate on the result of this
> query to derive my actual feature
>


  1   2   3   4   5   6   7   8   9   10   >