[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2019-03-29 Thread Andy Hind (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805000#comment-16805000
 ] 

Andy Hind commented on SOLR-12879:
--

Yes, there are two parts to the doc update. One for minhash filter in lucene, 
the other for the related qparser in solr.

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.0
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 8.0
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch, 
> minhash.qparser.adoc.fragment
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2019-03-29 Thread Cassandra Targett (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804956#comment-16804956
 ] 

Cassandra Targett commented on SOLR-12879:
--

bq. I do not see the docs for this updated/added in 8.0 ...

Looks like the docs patches were not ever committed. We haven't done the 8.0 
Ref Guide yet, so I can review them and commit so they can be included. I 
didn't follow this issue, so just to confirm, both of the {{*.adoc.fragment}} 
patches were intended to be added?

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.0
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 8.0
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch, 
> minhash.qparser.adoc.fragment
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2019-03-28 Thread Andy Hind (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804303#comment-16804303
 ] 

Andy Hind commented on SOLR-12879:
--

I do not see the docs for this updated/added in 8.0 ...

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 8.0
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 8.0
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch, 
> minhash.qparser.adoc.fragment
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-11-06 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676322#comment-16676322
 ] 

Tommaso Teofili commented on SOLR-12879:


[~andyhind] I think a separate issue is not needed.
The above doc looks good to me, for the _MinHashFilter_.
Would you be able to provide also some documentation about this query parser ?
I think it would be good if we could provide documentation for an end to end 
usage of the query parser in combination with the filter, if possible.


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-30 Thread Andy Hind (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668841#comment-16668841
 ] 

Andy Hind commented on SOLR-12879:
--

Should I raise separate issues for the documentation?

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-25 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663405#comment-16663405
 ] 

Tommaso Teofili commented on SOLR-12879:


it should be back to green now, thanks [~steve_rowe] for the heads up.

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663403#comment-16663403
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit 26e14986af7aa60b72940f611f63b2a50fbb9980 in lucene-solr's branch 
refs/heads/master from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=26e1498 ]

SOLR-12879 - added missing test for min_hash qp to QueryEqualityTest


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-24 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662504#comment-16662504
 ] 

Steve Rowe commented on SOLR-12879:
---

{{QueryEqualityTest}} is failing 100% of the time w/o a seed, e.g. from 
[https://builds.apache.org/job/Lucene-Solr-Tests-master/2895]:

{noformat}
Checking out Revision 3e89b7a771639aacaed6c21406624a2b27231dd7 
(refs/remotes/origin/master)
[...]
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=QueryEqualityTest 
-Dtests.seed=40E6483843AE2CD1 -Dtests.multiplier=2 -Dtests.slow=true 
-Dtests.locale=en-SG -Dtests.timezone=America/Los_Angeles -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.00s J1 | QueryEqualityTest (suite) <<<
   [junit4]> Throwable #1: java.lang.AssertionError: testParserCoverage was 
run w/o any other method explicitly testing qparser: min_hash
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([40E6483843AE2CD1]:0)
   [junit4]>at 
org.apache.solr.search.QueryEqualityTest.afterClassParserCoverageTest(QueryEqualityTest.java:59)
   [junit4]>at java.lang.Thread.run(Thread.java:748)
{noformat}

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661599#comment-16661599
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit 9df96d2530ed7098549cbd8bda2b347f8c26042b in lucene-solr's branch 
refs/heads/jira/http2 from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9df96d2 ]

SOLR-12879 - added missing attribution in CHANGES.txt


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661600#comment-16661600
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit 2e757f6c257687ab713f88b6a07cf4a355e4cf66 in lucene-solr's branch 
refs/heads/jira/http2 from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e757f6 ]

SOLR-12879 - registered MinHashQParserPlugin to QParserPlugin as min_hash


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.filter.adoc.fragment, minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-23 Thread Andy Hind (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660681#comment-16660681
 ] 

Andy Hind commented on SOLR-12879:
--

MinHash Filter doc ...

 

{quote}

== MinHash Filter

Generates a repeatably random fixed number of hash tokens from all the input 
tokens in the stream.
To do this it first consumes all of the input tokens from its source.
This filter would normally be preceded by a <>, as shown in the 
example below.

Each input token is hashed. It is subsequently "rehashed" `hashCount` times by 
combining with a set of precomputed hashes.
For each of the resulting hashes, the hash space is divided in to `bucketCount` 
buckets. The lowest set of `hashSetSize` hashes (usually a set of one)
is generated for each bucket.

This filter generates one type of signature or sketch for the input tokens and 
can be used to compute Jaccard similarity between documents.


*Arguments:*

`hashCount`:: (integer) the number of hashes to use. The default is 1.

`bucketCount`:: (integer) the number of buckets to use. The default is 512.

`hashSetSize`:: (integer) the size of the set for the lowest hashes from each 
bucket. The default is 1.

`withRotation`:: (boolean) if a hash bucket is empty, generate a hash value 
from the first previous bucket that has a value.
 The default is true if the bucket count is greater than 1 and false otherwise.


The number of hashes generated depends on the options above. With the default 
settings for `withRotation`, the number of hashes geerated is
`hashCount` x `bucketCount` x `hashSetSize` => 512, by default.

*Example:*

[source,xml]


 
 
 
 



*In:* "woof woof woof woof woof"

*Tokenizer to Filter:* "woof woof woof woof woof"

*Out:* "℁팽徭聙↝ꇁ홱杯", "℁팽徭聙↝ꇁ홱杯", "℁팽徭聙↝ꇁ홱杯",  a total of 512 times

{quote]

 

 

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660194#comment-16660194
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit 2e757f6c257687ab713f88b6a07cf4a355e4cf66 in lucene-solr's branch 
refs/heads/master from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e757f6 ]

SOLR-12879 - registered MinHashQParserPlugin to QParserPlugin as min_hash


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660193#comment-16660193
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit 9df96d2530ed7098549cbd8bda2b347f8c26042b in lucene-solr's branch 
refs/heads/master from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9df96d2 ]

SOLR-12879 - added missing attribution in CHANGES.txt


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-23 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660186#comment-16660186
 ] 

Tommaso Teofili commented on SOLR-12879:


+1 for backporting to 7.x branch.

bq. the parser could potentially be given a default name of (say) minhash and 
included in the standard plugins i.e. 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.5.0/solr/core/src/java/org/apache/solr/search/QParserPlugin.java#L46

good point, +1

bq. The solr/CHANGES.txt entry lacks the customary attribution, just an 
oversight I'm sure and easily fixed.

yes, sorry! I'll fix it right away.

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-22 Thread Andy Hind (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659793#comment-16659793
 ] 

Andy Hind commented on SOLR-12879:
--

I don't think there is any reason the patch would not go back to 7.x. It has no 
dependencies other than the analyser. It started life on 6.x, where it needed 
to disable query co-cordination.

The parser is mostly intended to be used with q and fg parameters. A default 
wire up would be great.

I would not be surprised if someone comes up with a use in streaming as it 
provides another distance measure.

I will look at adding the docs. The analyser should also have some explanation. 

 

 

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-22 Thread Christine Poerschke (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659170#comment-16659170
 ] 

Christine Poerschke commented on SOLR-12879:


Late to the party here. Hello.

* Would it be possible to backport to branch_7x too? LUCENE-6968 mentioned 
above appears to be in 7.0 but perhaps there are other dependencies? During the 
Lucene Hackday in Montreal [~andyhind] explained a little on what this logic is 
about and I think this could be of interest to folks on the upcoming 7.6 
release too.

* Is the intended use case for this query parser primarily direct e.g. via the 
{{q}} and {{fq}} parameters or indirect somehow e.g. via streaming expressions? 
If the use case is direct:
** the parser could potentially be given a default name of (say) {{minhash}} 
and included in the standard plugins i.e. 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.5.0/solr/core/src/java/org/apache/solr/search/QParserPlugin.java#L46
*** Users (and tests) would not need to configure {{}} then.
** the parser could be included in the Solr Reference Guide e.g. the 
http://lucene.apache.org/solr/guide/7_5/other-parsers.html section which is 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.5.0/solr/solr-ref-guide/src/other-parsers.adoc
 in version control.

* The solr/CHANGES.txt entry lacks the customary attribution, just an oversight 
I'm sure and easily fixed.

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-22 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658733#comment-16658733
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit a7c9c9d8cefc5115a058c0d443f3e1d1d8e51b5e in lucene-solr's branch 
refs/heads/jira/http2 from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a7c9c9d ]

SOLR-12879 - MinHash query parser


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-20 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657752#comment-16657752
 ] 

ASF subversion and git services commented on SOLR-12879:


Commit a7c9c9d8cefc5115a058c0d443f3e1d1d8e51b5e in lucene-solr's branch 
refs/heads/master from [~teofili]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a7c9c9d ]

SOLR-12879 - MinHash query parser


> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: minhash.patch
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12879) Query Parser for MinHash/LSH

2018-10-17 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653938#comment-16653938
 ] 

Tommaso Teofili commented on SOLR-12879:


bq. Should the score from the overall query be normalised?

I think that may depend, in some edge cases non normalized scores may generate 
unexpected bias. But all in all I don't think it should be.

> Query Parser for MinHash/LSH
> 
>
> Key: SOLR-12879
> URL: https://issues.apache.org/jira/browse/SOLR-12879
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (8.0)
>Reporter: Andy Hind
>Priority: Major
> Fix For: master (8.0)
>
>
> Following on from https://issues.apache.org/jira/browse/LUCENE-6968, provide 
> a query parser that builds queries that provide a measure of Jaccard 
> similarity. The initial patch includes banded queries that were also proposed 
> on the original issue.
>  
> I have one outstanding questions:
>  * Should the score from the overall query be normalised?
> Note, that the band count is currently approximate and may be one less than 
> in practise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org