[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277640#comment-14277640 ] Anshum Gupta commented on SOLR-6248: [~markus17] Thanks for the patch for 4.10 but that can't go in. It's a new feature and will be released with 5.0 (sometime really soon). I haven't looked at the patch yet but users who are running 4.10 and want to use this patch are free to do so. We can work on getting the bug fixes/tests into 5x now though. Can you provide a patch for trunk/5x for the tests/fixes? MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0, Trunk Attachments: SOLR-6248-4x.patch, SOLR-6248-4x.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197204#comment-14197204 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1636784 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1636784 ] SOLR-6248: Fixing an exception in case of missing qf MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197237#comment-14197237 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1636788 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1636788 ] SOLR-6248: Fixing an exception in case of missing qf (merge from trunk) MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189236#comment-14189236 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1635329 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1635329 ] SOLR-6248: Changing the format of mlt query parser MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189235#comment-14189235 ] Anshum Gupta commented on SOLR-6248: After a discussion with Hoss, I'm changing the format of the query parser. It wouldn't have an 'id' key in the request i.e. the new request would look like: {quote} \{!mlt qf=fieldname\}docId {quote} This would eliminate the need to document/maintain and track a new parameter name. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189268#comment-14189268 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1635336 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1635336 ] SOLR-6248: Changing request format for mlt queryparser (merge from trunk) MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186775#comment-14186775 ] Noble Paul commented on SOLR-6248: -- doesn't it make sense to put an example query in the description ? MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186848#comment-14186848 ] Noble Paul commented on SOLR-6248: -- I guess it is good to go MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187205#comment-14187205 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1634937 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1634937 ] SOLR-6248: MoreLikeThis QParser that works in standalone/cloud mode MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187217#comment-14187217 ] Markus Jelsma commented on SOLR-6248: - Anshun, very cool stuff here! bq. {!mlt id=docId qf=fieldNames} I assume this is not the Lucene DocID but the document's UniqueKey field value? Also, must we query the correct shard for it to work? MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187220#comment-14187220 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1634939 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1634939 ] SOLR-6248: MoreLikeThis QParser that works in standalone/cloud mode (merge from trunk) MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187234#comment-14187234 ] Anshum Gupta commented on SOLR-6248: Thanks [~markus17]. This is indeed the documents` unique key field value. Also, I don't think you'd need to target the correct shard as, in case of Cloud mode, it uses the /get handler. This has a lot of room for improvements/enhancements but I thought this was a good point to start with. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187241#comment-14187241 ] Anshum Gupta commented on SOLR-6248: [~noble.paul] Thanks for looking at the patch. I've added a sample query in there. Also, there's basic description as a part of the package.html. I'll also be adding the usage in the ref guide. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187244#comment-14187244 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1634941 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1634941 ] SOLR-6248: Removing svn:keywords that got auto-added with the commit hook. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187247#comment-14187247 ] ASF subversion and git services commented on SOLR-6248: --- Commit 1634942 from [~anshumg] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1634942 ] SOLR-6248: Removing svn:keywords that got auto-added with the commit hook. (merge from trunk) MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Fix For: 5.0 Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186323#comment-14186323 ] Erik Hatcher commented on SOLR-6248: [~anshumg] - your latest patch does not have the new files (need to svn add them). This new qparser, IMO, should be registered automatically in QParserPlugin, so it doesn't need to be registered in solrconfig.xml manually. Overall looks great (looking back and previous patches to see the new files)! +1 MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186379#comment-14186379 ] Anshum Gupta commented on SOLR-6248: [~ehatcher] Thanks for looking at it. I merged the changes into MoreLikeThis.java instead of duplicating code (so the files are actually gone). The patch has everything that's required but yes I'll have this automatically registered in QParserPlugin. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch, SOLR-6248.patch, SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142063#comment-14142063 ] Upayavira commented on SOLR-6248: - I also, after a conversation with Hoss, have knocked up a MLTQuery parser. I very much doubt your query parser will allow you to pass in a stream.body, because the first few lines of o.a.s.handler.component.SearchHandler.handleRequestBody() say: {code:java} if (req.getContentStreams() != null req.getContentStreams().iterator().hasNext()) { throw new SolrException(ErrorCode.BAD_REQUEST, Search requests cannot accept content streams); } {code} This needs to be removed for stream.body to be available to the query parser. I can post my patch later if anyone is interested. It doesn't have any tests yet. My next task is to work out how to make it work across cores (recommend docs in one core based upon docs in another). Regarding the patch in this ticket, I'm curious why you needed a SolrCloud specific query parser? Is it because the doc you are using might be in a different shard? Also, it appears from a cursory look that LWMoreLikeThis is a fork of Lucene's MoreLikeThis class. Is there a reason that is needed, and if so, why isn't it still in Lucene? I expect to be working on my own version this week, and if what I produce can be useful to others (via this ticket or otherwise), I'd be happy to contribute it. Thx! MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Assignee: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074360#comment-14074360 ] Steve Molloy commented on SOLR-6248: In this case it cannot replace the current MoreLikeThisHandler implementation which can analyze incoming text (as opposed to searching for a matching document in the index) in order to find similar documents in the index. Being able to query by unique field and returning similar documents is already covered by the MoreLikeThisComponent if you use rows=1 to get a single document and its set of similar ones. The use case that forces the MoreLikeThisHandler currently (at least that I know of) is really this on-the-fly analysis of text that is nowhere in the index. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074514#comment-14074514 ] Anshum Gupta commented on SOLR-6248: My bad, this was my mistake. The last time I'd looked at this patch was about 10 months ago. This works like a component but also lets you paginate and do other stuff with it. Let me check out if accepting text would make sense here (or if we could have something on similar lines). MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073275#comment-14073275 ] Steve Molloy commented on SOLR-6248: I'd like to give this a spin, but looking at the attached patch, it's unclear how to pass in text. The parsers seem to be looking at id parameter, I haven't seen any reference to stream.body. What parameter would be used to pass in text to be analyzed and for which to return similar documents? MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073671#comment-14073671 ] Vitaliy Zhovtyuk commented on SOLR-6248: With current implementation in patch mlt qparser can match document by unique field configured in schema and find similar document out of it. Parser syntax now look like {code}{!mlt id=17 qf=lowerfilt}lowerfilt:*{code} where id is value of unique field configure (not id column in schema), qf is matched fields to search. About passing text this parser can be extended with text parameter, search document by this term and look for similar document using existing implementation. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta Attachments: SOLR-6248.patch MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071645#comment-14071645 ] Steve Molloy commented on SOLR-6248: I meant passing in text as parameter as opposed to finding it in the index. With current MLT handler (not component), you can pass it in as body or stream.body to get documents similar to the text you pass in. In our case, we use it to find documents in one collection similar to a document found in another, or to some text directly provided by user. So, I know that at some point the SearchHandler started rejecting search requests with stream body, which would prevent this unless it could be achieved in another way. That's why I'm asking. :) MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071753#comment-14071753 ] Anshum Gupta commented on SOLR-6248: I don't think this would really work across 2 collections straight out of the box, but yes, as long as you have 'text' to pass, that is exactly what this parser would take. In other words, for now, it would more or less maintain the same mechanism of the handler (but in a manner that makes it work under SolrCloud mode). MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070897#comment-14070897 ] Anshum Gupta commented on SOLR-6248: What do you mean by text that isn't in the index? If you mean pseudo-random text to find documents similar to that? Yes, it would handle that. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser
[ https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068501#comment-14068501 ] Steve Molloy commented on SOLR-6248: Would that approach also support sending in text that isn't in the index? This is the main reason we're using the MLT handler, which we need to be distributed (thus SOLR-5480). but if we can have a single approach for both, I agree that not maintaining 2 configurations (and 2 handlers in the code) would be much better. Let me know if I can help out. MoreLikeThis Query Parser - Key: SOLR-6248 URL: https://issues.apache.org/jira/browse/SOLR-6248 Project: Solr Issue Type: New Feature Reporter: Anshum Gupta MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too. Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc. A bit of history about MLT (thanks to Hoss) MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that query, club them together to find interesting terms, and then use those terms as if they were my main query to generate a main result set. This result would then be used as the set to facet, highlight etc. The flow: Query - DocList(m) - Bag (terms) - Query - DocList\(y) The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set. DocSet\(n) - n * Bag (terms) - n * (Query) - n * DocList(m) The new approach: All of this can be done better and cleaner (and makes more sense too) using an MLT QParser. An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields. Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT. In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org