[jira] [Updated] (SOLR-17726) CloudMLTQParser fails to use copyFields due to RealTime Get

2025-04-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-17726:
--
Labels: morelikethis pull-request-available  (was: morelikethis)

> CloudMLTQParser fails to use copyFields due to RealTime Get
> ---
>
> Key: SOLR-17726
> URL: https://issues.apache.org/jira/browse/SOLR-17726
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis
>Affects Versions: 9.8.1
>Reporter: ilariapet
>Priority: Major
>  Labels: morelikethis, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When using CloudMLTQParser (the default MLT parser in SolrCloud), fields that 
> are populated exclusively via copyField are not taken into account when 
> constructing the MoreLikeThis query.
> This happens because CloudMLTQParser relies on a RealTime Get (`/get`) 
> request to retrieve the source document by ID, and the document returned by 
> RealTime Get does not include fields generated via copyField (i.e. are not 
> part of the original SolrIDocument).
> As a result, even if the copyField target is stored and has proper 
> termVectors configured, CloudMLTQParser skips the field silently, and the MLT 
> query ends up empty.
>  
> This behavior differs from SimpleMLTQParser (used in Solr standalone), which 
> does not rely on RealTimeGet but instead extracts the stored field content 
> and re-applies the analysis chain dynamically.
>  
> *STEPS TO REPRODUCE*
> 1. Define these fields in the schema.xml:
> {code:java}
> 
>  stored="true" termVectors="true"/>
>  {code}
> 2. Index a document that sets only the {{description}} field. The 
> {{descriptionMLT}} field is expected to be populated automatically via the 
> configured copyField directive.
> 3. ** Run an MLT query:
> {code:java}
> /select?q={!mlt qf=descriptionMLT}doc_id {code}
> 4. The resulting parsed query will be empty:
> {code:java}
> "parsedquery": "+() -documentId:32000"{code}
> If the same document is reindexed explicitly setting {{{}descriptionMLT{}}}, 
> the MLT query works.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Updated] (SOLR-17726) CloudMLTQParser fails to use copyFields due to RealTime Get

2025-04-15 Thread ilariapet (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ilariapet updated SOLR-17726:
-
Description: 
When using CloudMLTQParser (the default MLT parser in SolrCloud), fields that 
are populated exclusively via copyField are not taken into account when 
constructing the MoreLikeThis query.

This happens because CloudMLTQParser relies on a RealTime Get (`/get`) request 
to retrieve the source document by ID, and the document returned by RealTime 
Get does not include fields generated via copyField (i.e. are not part of the 
original SolrIDocument).

As a result, even if the copyField target is stored and has proper termVectors 
configured, CloudMLTQParser skips the field silently, and the MLT query ends up 
empty.
 
This behavior differs from SimpleMLTQParser (used in Solr standalone), which 
does not rely on RealTimeGet but instead extracts the stored field content and 
re-applies the analysis chain dynamically.
 
*STEPS TO REPRODUCE*
1. Define these fields in the schema.xml:
{code:java}



 {code}
2. Index a document that sets only the {{description}} field. The 
{{descriptionMLT}} field is expected to be populated automatically via the 
configured copyField directive.
3. ** Run an MLT query:
{code:java}
/select?q={!mlt qf=descriptionMLT}doc_id {code}
4. The resulting parsed query will be empty:
{code:java}
"parsedquery": "+() -documentId:32000"{code}
If the same document is reindexed explicitly setting {{{}descriptionMLT{}}}, 
the MLT query works.
 
 
 

  was:
When using CloudMLTQParser (the default MLT parser in SolrCloud), fields that 
are populated exclusively via copyField are not taken into account when 
constructing the MoreLikeThis query.

This happens because CloudMLTQParser relies on a RealTime Get (`/get`) request 
to retrieve the source document by ID, and the document returned by RealTime 
Get does not include fields generated via copyField (i.e. are not part of the 
original SolrIDocument).

As a result, even if the copyField target has proper termVectors configured, 
CloudMLTQParser skips the field silently, and the MLT query ends up empty.
 
This behavior differs from SimpleMLTQParser (used in Solr standalone), which 
does not rely on RealTimeGet but instead extracts the stored field content and 
re-applies the analysis chain dynamically.
 
*STEPS TO REPRODUCE*
1. Define these fields in the schema.xml:
{code:java}



 {code}
2. Index a document that sets only the {{description}} field. The 
{{descriptionMLT}} field is expected to be populated automatically via the 
configured copyField directive.
3. ** Run an MLT query:
{code:java}
/select?q={!mlt qf=descriptionMLT}doc_id {code}
4. The resulting parsed query will be empty:
{code:java}
"parsedquery": "+() -documentId:32000"{code}
If the same document is reindexed explicitly setting {{{}descriptionMLT{}}}, 
the MLT query works.
 
 
 


> CloudMLTQParser fails to use copyFields due to RealTime Get
> ---
>
> Key: SOLR-17726
> URL: https://issues.apache.org/jira/browse/SOLR-17726
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis
>Affects Versions: 9.8.1
>Reporter: ilariapet
>Priority: Major
>  Labels: morelikethis
>
> When using CloudMLTQParser (the default MLT parser in SolrCloud), fields that 
> are populated exclusively via copyField are not taken into account when 
> constructing the MoreLikeThis query.
> This happens because CloudMLTQParser relies on a RealTime Get (`/get`) 
> request to retrieve the source document by ID, and the document returned by 
> RealTime Get does not include fields generated via copyField (i.e. are not 
> part of the original SolrIDocument).
> As a result, even if the copyField target is stored and has proper 
> termVectors configured, CloudMLTQParser skips the field silently, and the MLT 
> query ends up empty.
>  
> This behavior differs from SimpleMLTQParser (used in Solr standalone), which 
> does not rely on RealTimeGet but instead extracts the stored field content 
> and re-applies the analysis chain dynamically.
>  
> *STEPS TO REPRODUCE*
> 1. Define these fields in the schema.xml:
> {code:java}
> 
>  stored="true" termVectors="true"/>
>  {code}
> 2. Index a document that sets only the {{description}} field. The 
> {{descriptionMLT}} field is expected to be populated automatically via the 
> configured copyField directive.
> 3. ** Run an MLT query:
> {code:java}
> /select?q={!mlt qf=descriptionMLT}doc_id {code}
> 4. The resulting parsed query will be empty:
> {code:java}
> "parsedquery": "+() -documentId:32000"{code}
> If the same document is reindexed explicitly setting {{{}descriptionMLT{}}}, 
> the MLT query works.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SOLR-17726) CloudMLTQParser fails to use copyFields due to RealTime Get

2025-04-05 Thread ilariapet (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ilariapet updated SOLR-17726:
-
Summary: CloudMLTQParser fails to use copyFields due to RealTime Get  (was: 
CloudMLTQParser fails to use copyFields due to RealTimeGet)

> CloudMLTQParser fails to use copyFields due to RealTime Get
> ---
>
> Key: SOLR-17726
> URL: https://issues.apache.org/jira/browse/SOLR-17726
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis
>Affects Versions: 9.8.1
>Reporter: ilariapet
>Priority: Major
>  Labels: morelikethis
>
> When using CloudMLTQParser (the default MLT parser in SolrCloud), fields that 
> are populated exclusively via copyField are not taken into account when 
> constructing the MoreLikeThis query.
> This happens because CloudMLTQParser relies on a RealTime Get (`/get`) 
> request to retrieve the source document by ID, and the document returned by 
> RealTime Get does not include fields generated via copyField (i.e. are not 
> part of the original SolrIDocument).
> As a result, even if the copyField target has proper termVectors configured, 
> CloudMLTQParser skips the field silently, and the MLT query ends up empty.
>  
> This behavior differs from SimpleMLTQParser (used in Solr standalone), which 
> does not rely on RealTimeGet but instead extracts the stored field content 
> and re-applies the analysis chain dynamically.
>  
> *STEPS TO REPRODUCE*
> 1. Define these fields in the schema.xml:
> {code:java}
> 
>  stored="true" termVectors="true"/>
>  {code}
> 2. Index a document that sets only the {{description}} field. The 
> {{descriptionMLT}} field is expected to be populated automatically via the 
> configured copyField directive.
> 3. ** Run an MLT query:
> {code:java}
> /select?q={!mlt qf=descriptionMLT}doc_id {code}
> 4. The resulting parsed query will be empty:
> {code:java}
> "parsedquery": "+() -documentId:32000"{code}
> If the same document is reindexed explicitly setting {{{}descriptionMLT{}}}, 
> the MLT query works.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]