Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

2018-02-12 Thread Evert Wagenaar
Use a MultiFieldQuerySearcher.
Like this;

{
"multi_match": {
"query":"quick brown fox",
"fields": [ "title", "body" ]
}
}


On Mon, 12 Feb 2018 at 22:04 Dominik Safaric 
wrote:

> Unfortunately you've misunderstood my question. The thing is that the
> FuzzyQuery does not unfortunately satisfy the given requirements of mine,
> in particular it is based on Levenshtein and not Hamming distance. Hence
> the need to implement the custom Query instance.
>
> As asked, how does Lucene internally store multi valued fields and is it
> possible to retrieve them in the same order as they were stored? In
> particular, I'd like to retrieve a multi valued keyword field in such a way.
>
> Kind regards,
> Dominik
>
> > On 12 Feb 2018, at 19:34, Adrien Grand  wrote:
> >
> > Filtering by one query and scoring by a different query is easy: just put
> > the filter in a FILTER clause of a BooleanQuery and the scoring query in
> a
> > SHOULD clause. Documents that do not match the SHOULD clause will have a
> > score of zero.
> >
> > I'm wondering that maybe you are looking for something like this:
> >
> > Query q = new BooleanQuery.Builder()
> >  .add(new FuzzyQuery(new Term("coarse_grained", "search_term")),
> > Occur.FILTER)
> >  .add(new FuzzyQuery(new Term("fine_grained", "search_term")),
> > Occur.SHOULD)
> >  .build();
> >
> > It's not clear to me why you need to retain order: the order of your
> values
> > should not matter?
> >
> > Le lun. 12 févr. 2018 à 11:23, Dominik Safaric 
> a
> > écrit :
> >
> >> In particular, I have a document schema as follows:
> >>
> >> {
> >> "images": [{
> >> "image_id": 1,
> >> "features": {
> >> "coarse_grained": ,
> >> "fine_grained": [**]
> >> }
> >> }]
> >> }
> >>
> >> In the first run, using a custom Query instance I'd like to hit
> documents
> >> by matching the *coarse_grained *field. A document is said to be
> matching
> >> if the Hamming distance between the value of a document's
> >> *coarse_grained* field,
> >> compared to the one passed through the REST API, is less or equal then a
> >> set threshold. On the other hand, I'd like to score the hit documents
> using
> >> the *fine_grained *field values, which is an array of keywords. A
> similar
> >> method using Hamming distance as a similarity measure applies in this
> case
> >> as well.
> >>
> >> What I'm concerned with is the following: in the second (the scoring)
> phase
> >> I'd like to score documents using all fields of the *fine_grained*
> array of
> >> keywords. How can I effectively retrieve these values for each document,
> >> such that their order is equal to the one as they were inserted?
> >>
> >> Thanks in advance,
> >> Dominik
> >>
> >> 2018-02-12 8:56 GMT+01:00 Adrien Grand :
> >>
> >>> Whether this is doable is going to depend on what you mean by
> "match[ing]
> >>> documents according to criteria X". Can you give an example?
> >>>
> >>> Le ven. 9 févr. 2018 à 14:47, Dominik Safaric <
> dominiksafa...@gmail.com>
> >> a
> >>> écrit :
> >>>
>  Hi,
> 
>  I am intending to implement a custom Query using Lucene 6.x and due to
> >>> the
>  lack of documentation concerned with a particular topic I have the
>  following questions.
> 
>  The query is expected to implement a two-phase search, in the sense
> >> that
>  during the first run it matches documents according to criteria X,
> >>> whereas
>  during the later according to criteria Y of another document field.
> Can
>  this be accomplished by using the TwoPhaseIterator?
> 
>  Secondly, the query as expressed through the API will not specify a
>  specific query field, but instead of a field that stores an array of
>  objects. From an implementation point of view, can I using the
> >> LeafReader
>  retrieve an object that would map to a Java Map, which I can later use
> >>> for
>  accessing a certain field within the object? Of is it perhaps more
>  advisable to get the document instance using the LeafReader's
>  getDocument(int docID) function, and then load particular? I'm afraid
> >>> that
>  might hurt the performance in overall because the documents would need
> >> to
>  be loaded from disk.
> 
>  Thanks in advance,
>  Dominik
>  -
>  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>  For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> >>>
> >>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> --
Sent from Gmail IPad


Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

2018-02-12 Thread Dominik Safaric
Unfortunately you've misunderstood my question. The thing is that the 
FuzzyQuery does not unfortunately satisfy the given requirements of mine, in 
particular it is based on Levenshtein and not Hamming distance. Hence the need 
to implement the custom Query instance.

As asked, how does Lucene internally store multi valued fields and is it 
possible to retrieve them in the same order as they were stored? In particular, 
I'd like to retrieve a multi valued keyword field in such a way. 
 
Kind regards,
Dominik

> On 12 Feb 2018, at 19:34, Adrien Grand  wrote:
> 
> Filtering by one query and scoring by a different query is easy: just put
> the filter in a FILTER clause of a BooleanQuery and the scoring query in a
> SHOULD clause. Documents that do not match the SHOULD clause will have a
> score of zero.
> 
> I'm wondering that maybe you are looking for something like this:
> 
> Query q = new BooleanQuery.Builder()
>  .add(new FuzzyQuery(new Term("coarse_grained", "search_term")),
> Occur.FILTER)
>  .add(new FuzzyQuery(new Term("fine_grained", "search_term")),
> Occur.SHOULD)
>  .build();
> 
> It's not clear to me why you need to retain order: the order of your values
> should not matter?
> 
> Le lun. 12 févr. 2018 à 11:23, Dominik Safaric  a
> écrit :
> 
>> In particular, I have a document schema as follows:
>> 
>> {
>> "images": [{
>> "image_id": 1,
>> "features": {
>> "coarse_grained": ,
>> "fine_grained": [**]
>> }
>> }]
>> }
>> 
>> In the first run, using a custom Query instance I'd like to hit documents
>> by matching the *coarse_grained *field. A document is said to be matching
>> if the Hamming distance between the value of a document's
>> *coarse_grained* field,
>> compared to the one passed through the REST API, is less or equal then a
>> set threshold. On the other hand, I'd like to score the hit documents using
>> the *fine_grained *field values, which is an array of keywords. A similar
>> method using Hamming distance as a similarity measure applies in this case
>> as well.
>> 
>> What I'm concerned with is the following: in the second (the scoring) phase
>> I'd like to score documents using all fields of the *fine_grained* array of
>> keywords. How can I effectively retrieve these values for each document,
>> such that their order is equal to the one as they were inserted?
>> 
>> Thanks in advance,
>> Dominik
>> 
>> 2018-02-12 8:56 GMT+01:00 Adrien Grand :
>> 
>>> Whether this is doable is going to depend on what you mean by "match[ing]
>>> documents according to criteria X". Can you give an example?
>>> 
>>> Le ven. 9 févr. 2018 à 14:47, Dominik Safaric 
>> a
>>> écrit :
>>> 
 Hi,
 
 I am intending to implement a custom Query using Lucene 6.x and due to
>>> the
 lack of documentation concerned with a particular topic I have the
 following questions.
 
 The query is expected to implement a two-phase search, in the sense
>> that
 during the first run it matches documents according to criteria X,
>>> whereas
 during the later according to criteria Y of another document field. Can
 this be accomplished by using the TwoPhaseIterator?
 
 Secondly, the query as expressed through the API will not specify a
 specific query field, but instead of a field that stores an array of
 objects. From an implementation point of view, can I using the
>> LeafReader
 retrieve an object that would map to a Java Map, which I can later use
>>> for
 accessing a certain field within the object? Of is it perhaps more
 advisable to get the document instance using the LeafReader's
 getDocument(int docID) function, and then load particular? I'm afraid
>>> that
 might hurt the performance in overall because the documents would need
>> to
 be loaded from disk.
 
 Thanks in advance,
 Dominik
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 
>>> 
>> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

2018-02-12 Thread Adrien Grand
Filtering by one query and scoring by a different query is easy: just put
the filter in a FILTER clause of a BooleanQuery and the scoring query in a
SHOULD clause. Documents that do not match the SHOULD clause will have a
score of zero.

I'm wondering that maybe you are looking for something like this:

Query q = new BooleanQuery.Builder()
  .add(new FuzzyQuery(new Term("coarse_grained", "search_term")),
Occur.FILTER)
  .add(new FuzzyQuery(new Term("fine_grained", "search_term")),
Occur.SHOULD)
  .build();

It's not clear to me why you need to retain order: the order of your values
should not matter?

Le lun. 12 févr. 2018 à 11:23, Dominik Safaric  a
écrit :

> In particular, I have a document schema as follows:
>
> {
> "images": [{
> "image_id": 1,
> "features": {
> "coarse_grained": ,
> "fine_grained": [**]
> }
> }]
> }
>
> In the first run, using a custom Query instance I'd like to hit documents
> by matching the *coarse_grained *field. A document is said to be matching
> if the Hamming distance between the value of a document's
> *coarse_grained* field,
> compared to the one passed through the REST API, is less or equal then a
> set threshold. On the other hand, I'd like to score the hit documents using
> the *fine_grained *field values, which is an array of keywords. A similar
> method using Hamming distance as a similarity measure applies in this case
> as well.
>
> What I'm concerned with is the following: in the second (the scoring) phase
> I'd like to score documents using all fields of the *fine_grained* array of
> keywords. How can I effectively retrieve these values for each document,
> such that their order is equal to the one as they were inserted?
>
> Thanks in advance,
> Dominik
>
> 2018-02-12 8:56 GMT+01:00 Adrien Grand :
>
> > Whether this is doable is going to depend on what you mean by "match[ing]
> > documents according to criteria X". Can you give an example?
> >
> > Le ven. 9 févr. 2018 à 14:47, Dominik Safaric 
> a
> > écrit :
> >
> > > Hi,
> > >
> > > I am intending to implement a custom Query using Lucene 6.x and due to
> > the
> > > lack of documentation concerned with a particular topic I have the
> > > following questions.
> > >
> > > The query is expected to implement a two-phase search, in the sense
> that
> > > during the first run it matches documents according to criteria X,
> > whereas
> > > during the later according to criteria Y of another document field. Can
> > > this be accomplished by using the TwoPhaseIterator?
> > >
> > > Secondly, the query as expressed through the API will not specify a
> > > specific query field, but instead of a field that stores an array of
> > > objects. From an implementation point of view, can I using the
> LeafReader
> > > retrieve an object that would map to a Java Map, which I can later use
> > for
> > > accessing a certain field within the object? Of is it perhaps more
> > > advisable to get the document instance using the LeafReader's
> > > getDocument(int docID) function, and then load particular? I'm afraid
> > that
> > > might hurt the performance in overall because the documents would need
> to
> > > be loaded from disk.
> > >
> > > Thanks in advance,
> > > Dominik
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
>


Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

2018-02-12 Thread Dominik Safaric
In particular, I have a document schema as follows:

{
"images": [{
"image_id": 1,
"features": {
"coarse_grained": ,
"fine_grained": [**]
}
}]
}

In the first run, using a custom Query instance I'd like to hit documents
by matching the *coarse_grained *field. A document is said to be matching
if the Hamming distance between the value of a document's
*coarse_grained* field,
compared to the one passed through the REST API, is less or equal then a
set threshold. On the other hand, I'd like to score the hit documents using
the *fine_grained *field values, which is an array of keywords. A similar
method using Hamming distance as a similarity measure applies in this case
as well.

What I'm concerned with is the following: in the second (the scoring) phase
I'd like to score documents using all fields of the *fine_grained* array of
keywords. How can I effectively retrieve these values for each document,
such that their order is equal to the one as they were inserted?

Thanks in advance,
Dominik

2018-02-12 8:56 GMT+01:00 Adrien Grand :

> Whether this is doable is going to depend on what you mean by "match[ing]
> documents according to criteria X". Can you give an example?
>
> Le ven. 9 févr. 2018 à 14:47, Dominik Safaric  a
> écrit :
>
> > Hi,
> >
> > I am intending to implement a custom Query using Lucene 6.x and due to
> the
> > lack of documentation concerned with a particular topic I have the
> > following questions.
> >
> > The query is expected to implement a two-phase search, in the sense that
> > during the first run it matches documents according to criteria X,
> whereas
> > during the later according to criteria Y of another document field. Can
> > this be accomplished by using the TwoPhaseIterator?
> >
> > Secondly, the query as expressed through the API will not specify a
> > specific query field, but instead of a field that stores an array of
> > objects. From an implementation point of view, can I using the LeafReader
> > retrieve an object that would map to a Java Map, which I can later use
> for
> > accessing a certain field within the object? Of is it perhaps more
> > advisable to get the document instance using the LeafReader's
> > getDocument(int docID) function, and then load particular? I'm afraid
> that
> > might hurt the performance in overall because the documents would need to
> > be loaded from disk.
> >
> > Thanks in advance,
> > Dominik
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


Re: Lucene custom Query - efficiently and compare retrieve multiple document fields

2018-02-11 Thread Adrien Grand
Whether this is doable is going to depend on what you mean by "match[ing]
documents according to criteria X". Can you give an example?

Le ven. 9 févr. 2018 à 14:47, Dominik Safaric  a
écrit :

> Hi,
>
> I am intending to implement a custom Query using Lucene 6.x and due to the
> lack of documentation concerned with a particular topic I have the
> following questions.
>
> The query is expected to implement a two-phase search, in the sense that
> during the first run it matches documents according to criteria X, whereas
> during the later according to criteria Y of another document field. Can
> this be accomplished by using the TwoPhaseIterator?
>
> Secondly, the query as expressed through the API will not specify a
> specific query field, but instead of a field that stores an array of
> objects. From an implementation point of view, can I using the LeafReader
> retrieve an object that would map to a Java Map, which I can later use for
> accessing a certain field within the object? Of is it perhaps more
> advisable to get the document instance using the LeafReader's
> getDocument(int docID) function, and then load particular? I'm afraid that
> might hurt the performance in overall because the documents would need to
> be loaded from disk.
>
> Thanks in advance,
> Dominik
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>