[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490478#comment-16490478 ] ASF subversion and git services commented on SOLR-5351: --- Commit 41ecad9897bb8949bfed730cd988aec58aa69775 in lucene-solr's branch refs/heads/master from [~dawid.weiss] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=41ecad9 ] SOLR-5351: Fixed More Like This Handler to use all fields provided in mlt.fl when used with content stream. The similarity is calculated between the content stream's value and all fields listed in mlt.fl. > More Like This Handler uses only first field in mlt.fl when using stream.body > - > > Key: SOLR-5351 > URL: https://issues.apache.org/jira/browse/SOLR-5351 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 > Environment: Linux,Windows >Reporter: Zygmunt Wiercioch >Assignee: Tommaso Teofili >Priority: Minor > Attachments: SOLR-5351.patch, SOLR-5351.patch > > > The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler > indicates that one can use multiple fields for similarity in mlt.fl: > http://localhost:8983/solr/mlt?stream.body=electronics%20memory=manu,cat=list=0 > In trying this, only one field is used. > Looking at the code, it only looks at the first field: > public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, > List filters, List terms, int flags ) throws > IOException > { > // analyzing with the first field: previous (stupid) behavior > rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490477#comment-16490477 ] ASF subversion and git services commented on SOLR-5351: --- Commit d2e9ad200802801423061fe6019c8e8c6dc1b62f in lucene-solr's branch refs/heads/branch_7x from [~dawid.weiss] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d2e9ad2 ] SOLR-5351: Fixed More Like This Handler to use all fields provided in mlt.fl when used with content stream. The similarity is calculated between the content stream's value and all fields listed in mlt.fl. > More Like This Handler uses only first field in mlt.fl when using stream.body > - > > Key: SOLR-5351 > URL: https://issues.apache.org/jira/browse/SOLR-5351 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 > Environment: Linux,Windows >Reporter: Zygmunt Wiercioch >Assignee: Tommaso Teofili >Priority: Minor > Attachments: SOLR-5351.patch, SOLR-5351.patch > > > The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler > indicates that one can use multiple fields for similarity in mlt.fl: > http://localhost:8983/solr/mlt?stream.body=electronics%20memory=manu,cat=list=0 > In trying this, only one field is used. > Looking at the code, it only looks at the first field: > public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, > List filters, List terms, int flags ) throws > IOException > { > // analyzing with the first field: previous (stupid) behavior > rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382048#comment-16382048 ] Tommaso Teofili commented on SOLR-5351: --- +1 thanks [~dweiss] , the patch looks good to me, thanks ! > More Like This Handler uses only first field in mlt.fl when using stream.body > - > > Key: SOLR-5351 > URL: https://issues.apache.org/jira/browse/SOLR-5351 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 > Environment: Linux,Windows >Reporter: Zygmunt Wiercioch >Assignee: Tommaso Teofili >Priority: Minor > Attachments: SOLR-5351.patch, SOLR-5351.patch > > > The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler > indicates that one can use multiple fields for similarity in mlt.fl: > http://localhost:8983/solr/mlt?stream.body=electronics%20memory=manu,cat=list=0 > In trying this, only one field is used. > Looking at the code, it only looks at the first field: > public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, > List filters, List terms, int flags ) throws > IOException > { > // analyzing with the first field: previous (stupid) behavior > rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381978#comment-16381978 ] Dawid Weiss commented on SOLR-5351: --- I implemented support for multiple fields in the MLT handler. I also corrected the test which had bugs in it (expected exception block was not guarded and always falling through, effectively verifying nothing). I also cleaned up the code formatting a bit; this will obscure the patch slightly, but shouldn't be too bad. If there are no objections, I'd like to commit it in soon. > More Like This Handler uses only first field in mlt.fl when using stream.body > - > > Key: SOLR-5351 > URL: https://issues.apache.org/jira/browse/SOLR-5351 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 > Environment: Linux,Windows >Reporter: Zygmunt Wiercioch >Assignee: Tommaso Teofili >Priority: Minor > Attachments: SOLR-5351.patch > > > The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler > indicates that one can use multiple fields for similarity in mlt.fl: > http://localhost:8983/solr/mlt?stream.body=electronics%20memory=manu,cat=list=0 > In trying this, only one field is used. > Looking at the code, it only looks at the first field: > public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, > List filters, List terms, int flags ) throws > IOException > { > // analyzing with the first field: previous (stupid) behavior > rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365298#comment-16365298 ] Dawid Weiss commented on SOLR-5351: --- My plan is to apply the stream (reader) to all fields expressed in mlt.fl. If there is one field, nothing changes. If multiple fields are provided, we have to buffer the reader and then pass the buffer's content to all requested fields, effectively extracting terms of interest. Some care must be taken to apply boosting properly (I'd create a parent Boolean query with sub-clauses for each field, they can preserve all their parameters: min matches, field-boosts, etc.). > More Like This Handler uses only first field in mlt.fl when using stream.body > - > > Key: SOLR-5351 > URL: https://issues.apache.org/jira/browse/SOLR-5351 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 > Environment: Linux,Windows >Reporter: Zygmunt Wiercioch >Assignee: Tommaso Teofili >Priority: Minor > > The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler > indicates that one can use multiple fields for similarity in mlt.fl: > http://localhost:8983/solr/mlt?stream.body=electronics%20memory=manu,cat=list=0 > In trying this, only one field is used. > Looking at the code, it only looks at the firs field: > public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, > List filters, List terms, int flags ) throws > IOException > { > // analyzing with the first field: previous (stupid) behavior > rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365277#comment-16365277 ] Dawid Weiss commented on SOLR-5351: --- This issue is still present in all releases of Solr. When you use multiple mlt.fl fields and a text stream on input, only the first field is taken into account, with this comment in MoreLikeThisHandler: {code} public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, List filters, List terms, int flags ) throws IOException { // analyzing with the first field: previous (stupid) behavior rawMLTQuery = mlt.like(mlt.getFieldNames()[0], reader); {code} It is stupid and trappy, I'd like to fix it. If there are any reasons to keep this behavior for backward compatibility please let me know. I assume it's just a bug (prevents you from using mlt.qf, etc.). > More Like This Handler uses only first field in mlt.fl when using stream.body > - > > Key: SOLR-5351 > URL: https://issues.apache.org/jira/browse/SOLR-5351 > Project: Solr > Issue Type: Bug > Components: MoreLikeThis >Affects Versions: 4.4 > Environment: Linux,Windows >Reporter: Zygmunt Wiercioch >Assignee: Tommaso Teofili >Priority: Minor > > The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler > indicates that one can use multiple fields for similarity in mlt.fl: > http://localhost:8983/solr/mlt?stream.body=electronics%20memory=manu,cat=list=0 > In trying this, only one field is used. > Looking at the code, it only looks at the firs field: > public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, > List filters, List terms, int flags ) throws > IOException > { > // analyzing with the first field: previous (stupid) behavior > rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557727#comment-14557727 ] Kais Hassan commented on SOLR-5351: --- Thanks Upayavira for your solution, I tried it with Solr 4.10 and it works well. I agree with Tommaso, this should be handled at the Lucene level. More Like This Handler uses only first field in mlt.fl when using stream.body - Key: SOLR-5351 URL: https://issues.apache.org/jira/browse/SOLR-5351 Project: Solr Issue Type: Bug Components: MoreLikeThis Affects Versions: 4.4 Environment: Linux,Windows Reporter: Zygmunt Wiercioch Assignee: Tommaso Teofili Priority: Minor The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler indicates that one can use multiple fields for similarity in mlt.fl: http://localhost:8983/solr/mlt?stream.body=electronics%20memorymlt.fl=manu,catmlt.interestingTerms=listmlt.mintf=0 In trying this, only one field is used. Looking at the code, it only looks at the firs field: public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, ListQuery filters, ListInterestingTerm terms, int flags ) throws IOException { // analyzing with the first field: previous (stupid) behavior rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969610#comment-13969610 ] Upayavira commented on SOLR-5351: - These two snippets of code seem to work. I will recraft it as a patch soon. However, I wonder if there are more standard ways of rereading a Reader. {code:java} private class ResettingReader extends Reader { Reader wrapped; public ResettingReader(Reader reader) { wrapped = reader; } public int read(char[]buf, int off, int len) throws IOException { return wrapped.read(buf, off, len); } public void close() throws IOException { wrapped.reset(); } public void reallyClose() throws IOException { wrapped.close(); } } {code} {code:java} public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, ListQuery filters, ListInterestingTerm terms, int flags ) throws IOException { if (mlt.getFieldNames().length==1) { rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); } else { BooleanQuery mltQuery = new BooleanQuery(); ResettingReader reader2 = new ResettingReader(reader); for (String fieldName : mlt.getFieldNames()) { BooleanQuery singleFieldQuery = (BooleanQuery)mlt.like(reader2, fieldName); for (BooleanClause clause : singleFieldQuery.getClauses()) { mltQuery.add(clause); } } reader2.reallyClose(); rawMLTQuery = mltQuery; } ... {code} More Like This Handler uses only first field in mlt.fl when using stream.body - Key: SOLR-5351 URL: https://issues.apache.org/jira/browse/SOLR-5351 Project: Solr Issue Type: Bug Components: MoreLikeThis Affects Versions: 4.4 Environment: Linux,Windows Reporter: Zygmunt Wiercioch Assignee: Tommaso Teofili Priority: Minor The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler indicates that one can use multiple fields for similarity in mlt.fl: http://localhost:8983/solr/mlt?stream.body=electronics%20memorymlt.fl=manu,catmlt.interestingTerms=listmlt.mintf=0 In trying this, only one field is used. Looking at the code, it only looks at the firs field: public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, ListQuery filters, ListInterestingTerm terms, int flags ) throws IOException { // analyzing with the first field: previous (stupid) behavior rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5351) More Like This Handler uses only first field in mlt.fl when using stream.body
[ https://issues.apache.org/jira/browse/SOLR-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925680#comment-13925680 ] Tommaso Teofili commented on SOLR-5351: --- the problem to me is not (only) on the Solr side, but I think Lucene's MoreLikeThis should support multiple fields instead of just one, e.g. adding a new method like {code} Query like(Reader r, String... fieldNames) throws IOException {code} More Like This Handler uses only first field in mlt.fl when using stream.body - Key: SOLR-5351 URL: https://issues.apache.org/jira/browse/SOLR-5351 Project: Solr Issue Type: Bug Components: MoreLikeThis Affects Versions: 4.4 Environment: Linux,Windows Reporter: Zygmunt Wiercioch Priority: Minor The documentation at: http://wiki.apache.org/solr/MoreLikeThisHandler indicates that one can use multiple fields for similarity in mlt.fl: http://localhost:8983/solr/mlt?stream.body=electronics%20memorymlt.fl=manu,catmlt.interestingTerms=listmlt.mintf=0 In trying this, only one field is used. Looking at the code, it only looks at the firs field: public DocListAndSet getMoreLikeThis( Reader reader, int start, int rows, ListQuery filters, ListInterestingTerm terms, int flags ) throws IOException { // analyzing with the first field: previous (stupid) behavior rawMLTQuery = mlt.like(reader, mlt.getFieldNames()[0]); -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org