Re: Position search

2019-10-16 Thread Tim Casey
Adi,

If you are looking for something specific you might want to try something
different.  Before you would search 'the end of a document', you might
think about segmenting the document and searching specific segments.  At
the end of a lot of things like email will be signatures.  Those are fairly
standard language, although mostly the same in meaning, do differ in
specific language.  They are a common segment.

If you are searching something like research papers, then you would be
thinking about the conclusion (?), bibliography (?).  It does not matter,
but there will be specific segments.

I think you will find the last N tokens of a document have some odd
categories within the search results.  I might guess you have a different
purpose in mind.  Either way, you would likely do better to segment what
you are searching.

tim

On Mon, Oct 14, 2019 at 11:25 PM Kaminski, Adi 
wrote:

> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for
> specific terms/phrases/expressions while limiting the search from position
> perspective.
> For example to search only in the first/last 100 words of the document ?
>
> Is there any built-in functionality for that ?
>
> Thanks in advance,
> Adi
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


Re: Position search

2019-10-16 Thread Alexandre Rafalovitch
Well, after some digging and trying to recall things:
1) XMLParser allows to specify a query in a different way from normal
query parameters:
https://lucene.apache.org/solr/guide/8_1/other-parsers.html#xml-query-parser
2) SpanFirst allowed to anchor the search to the start of the text and
provide the initial number of tokens to search within. It is not well
documented but apparently somebody did some tests:
https://coding-art.blogspot.com/2016/05/apache-solr-xml-query-parser.html
3) SpanFirst is actually a simpler use case of a more general matcher
(SpanPositionRangeQuery)
4) SpanPositionRangeQuery is not yet exposed in Solr, but will be in
8.3: https://issues.apache.org/jira/browse/SOLR-13663

So, I would test your example with XMLParser and SpanFirst (perhaps on
latest 8.x Solr). If that works, you have an approach for at least
initial X query and know you have an easy upgrade when 8.3 is out
(soon). Alternatively, you can play with SpanFirst and reversal of the
field.

Regards,
   Alex.
P.s. Also, SpanFirst apparently boosts matches early in the text
higher than those later. That's in the mailing list archive
discussions, which you can search on the web. E.,g.
https://lists.apache.org/thread.html/014db9dcef44a8f9641600d19cfaa528f33bac676b7ac68903537b75@%3Csolr-user.lucene.apache.org%3E

On Wed, 16 Oct 2019 at 08:17, Kaminski, Adi  wrote:
>
> Hi,
> These are really text positions.
> For example I have a document: "hello thanks for calling the support how can 
> I help you"
>
> And in the application I would like to search for documents that match 
> "thanks" NEAR "support" only in first 30 words of the document (greeting part 
> for example), and not in the middle/end part of the document.
>
> Regards,
> Adi
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: Wednesday, October 16, 2019 12:48 PM
> To: solr-user 
> Subject: Re: Position search
>
> So are these really text locations or rather actually sections of the 
> document. If later, can you parse out sections during indexing?
>
> Regards,
>  Alex
>
> On Wed, Oct 16, 2019, 3:57 AM Kaminski, Adi, 
> wrote:
>
> > Hi,
> > Thanks for the responses.
> >
> > It's a soft boundary which is resulted by dynamic syntax from our
> > application. So may vary from different user searches, one user can
> > search some "word1" in starting 30 words, and another can search
> > "word2" in starting 10 words. The use case is to match some
> > terms/phrase in specific document places in order to identify 
> > scripts/specific word ocuurences.
> >
> > So I guess copy field won't work here.
> >
> > Any other suggestions/thoughts ?
> > Maybe some hidden position filters in native level to limit from
> > start/end of the document ?
> >
> > Thanks,
> > Adi
> >
> > -Original Message-
> > From: Tim Casey 
> > Sent: Tuesday, October 15, 2019 11:05 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Position search
> >
> > If this is about a normalized query, I would put the normalization
> > text into a specific field.  The reason for this is you may want to
> > search the overall text during any form of expansion phase of searching for 
> > data.
> > That is, maybe you want to know the context of up to the 120th word.
> > At least you have both.
> > Also, you may want to note which normalized fields were truncated or
> > were simply too small. This would give some guidance as to the bias of
> > the normalization.  If 95% of the fields were not truncated, there is
> > a chance you are not doing good at normalizing because you have a set
> > of particularly short messages.  So I would expect a small set of side
> > fields remarking this.  This would allow you to carry the measures
> > along with the data.
> >
> > tim
> >
> > On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch
> >  > >
> > wrote:
> >
> > > Is the 100 words a hard boundary or a soft one?
> > >
> > > If it is a hard one (always 100 words), the easiest is probably copy
> > > field and in the (unstored) copy, trim off whatever you don't want
> > > to search. Possibly using regular expressions. Of course, "what's a word"
> > > is an important question here.
> > >
> > > Similarly, you could do that with Update Request Processors and
> > > clone/process field even before it hits the schema. Then you could
> > > store the extract for highlighting purposes.
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Tue, 15 Oct 2019 at 02:25, Kamins

Re: Position search

2019-10-16 Thread Erick Erickson
Three things off the top of my head, in order of how long it’d take to 
implement:

***
If it’s _always_ some distance from the start or end, index special beginning 
and end tags. perhaps a nonsense string like BEGINslkdjfhsldkfhsdkfh  and 
ENDslakshalskdfhj. Now your searches become phrase queries with slop. Searching 
for “erick in the first 100 words” becomes:

"BEGINslkdjfhsldkfhsdkfh erick”~100

***
Index each term with a payload indicating its position and use a payload 
function to determine whether the term should count as a hit. You’d probably 
have to have a field telling you how long is field is to know what offset “50 
words from the end” is.

***
Get into the low level Lucene code. After all if you index the position 
information to support phrase queries, you have exactly the position of the 
word. NOTE: you’d also probably have to index a separate field with the total 
length of the field in it so you know what position “100 words from the end” 
is. I suspect you could make this the most efficient, but I wouldn’t go here 
unless your performance is poor as it’d take some development work.

Note: I haven’t thought these out very carefully so caveat emptor.

Here’s a place to get started with payloads if you decide to go that route:

https://lucidworks.com/post/solr-payloads/

Best,
Erick


> On Oct 16, 2019, at 5:47 AM, Alexandre Rafalovitch  wrote:
> 
> So are these really text locations or rather actually sections of the
> document. If later, can you parse out sections during indexing?
> 
> Regards,
> Alex
> 
> On Wed, Oct 16, 2019, 3:57 AM Kaminski, Adi, 
> wrote:
> 
>> Hi,
>> Thanks for the responses.
>> 
>> It's a soft boundary which is resulted by dynamic syntax from our
>> application. So may vary from different user searches, one user can search
>> some "word1" in starting 30 words, and another can search "word2" in
>> starting 10 words. The use case is to match some terms/phrase in specific
>> document places in order to identify scripts/specific word ocuurences.
>> 
>> So I guess copy field won't work here.
>> 
>> Any other suggestions/thoughts ?
>> Maybe some hidden position filters in native level to limit from start/end
>> of the document ?
>> 
>> Thanks,
>> Adi
>> 
>> -Original Message-
>> From: Tim Casey 
>> Sent: Tuesday, October 15, 2019 11:05 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Position search
>> 
>> If this is about a normalized query, I would put the normalization text
>> into a specific field.  The reason for this is you may want to search the
>> overall text during any form of expansion phase of searching for data.
>> That is, maybe you want to know the context of up to the 120th word.  At
>> least you have both.
>> Also, you may want to note which normalized fields were truncated or were
>> simply too small. This would give some guidance as to the bias of the
>> normalization.  If 95% of the fields were not truncated, there is a chance
>> you are not doing good at normalizing because you have a set of
>> particularly short messages.  So I would expect a small set of side fields
>> remarking this.  This would allow you to carry the measures along with the
>> data.
>> 
>> tim
>> 
>> On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch >> 
>> wrote:
>> 
>>> Is the 100 words a hard boundary or a soft one?
>>> 
>>> If it is a hard one (always 100 words), the easiest is probably copy
>>> field and in the (unstored) copy, trim off whatever you don't want to
>>> search. Possibly using regular expressions. Of course, "what's a word"
>>> is an important question here.
>>> 
>>> Similarly, you could do that with Update Request Processors and
>>> clone/process field even before it hits the schema. Then you could
>>> store the extract for highlighting purposes.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi 
>>> wrote:
>>>> 
>>>> Hi,
>>>> What's the recommended way to search in Solr (assuming 8.2 is used)
>>>> for
>>> specific terms/phrases/expressions while limiting the search from
>>> position perspective.
>>>> For example to search only in the first/last 100 words of the document
>> ?
>>>> 
>>>> Is there any built-in functionality for that ?
>>>> 
>>>> Thanks in advance,
>>>> Adi
>>>> 
>>>> 
>>>> This electronic message may contain proprietary and confidential
>

RE: Position search

2019-10-16 Thread Kaminski, Adi
Hi,
These are really text positions.
For example I have a document: "hello thanks for calling the support how can I 
help you"

And in the application I would like to search for documents that match "thanks" 
NEAR "support" only in first 30 words of the document (greeting part for 
example), and not in the middle/end part of the document.

Regards,
Adi

-Original Message-
From: Alexandre Rafalovitch 
Sent: Wednesday, October 16, 2019 12:48 PM
To: solr-user 
Subject: Re: Position search

So are these really text locations or rather actually sections of the document. 
If later, can you parse out sections during indexing?

Regards,
 Alex

On Wed, Oct 16, 2019, 3:57 AM Kaminski, Adi, 
wrote:

> Hi,
> Thanks for the responses.
>
> It's a soft boundary which is resulted by dynamic syntax from our
> application. So may vary from different user searches, one user can
> search some "word1" in starting 30 words, and another can search
> "word2" in starting 10 words. The use case is to match some
> terms/phrase in specific document places in order to identify 
> scripts/specific word ocuurences.
>
> So I guess copy field won't work here.
>
> Any other suggestions/thoughts ?
> Maybe some hidden position filters in native level to limit from
> start/end of the document ?
>
> Thanks,
> Adi
>
> -Original Message-----
> From: Tim Casey 
> Sent: Tuesday, October 15, 2019 11:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Position search
>
> If this is about a normalized query, I would put the normalization
> text into a specific field.  The reason for this is you may want to
> search the overall text during any form of expansion phase of searching for 
> data.
> That is, maybe you want to know the context of up to the 120th word.
> At least you have both.
> Also, you may want to note which normalized fields were truncated or
> were simply too small. This would give some guidance as to the bias of
> the normalization.  If 95% of the fields were not truncated, there is
> a chance you are not doing good at normalizing because you have a set
> of particularly short messages.  So I would expect a small set of side
> fields remarking this.  This would allow you to carry the measures
> along with the data.
>
> tim
>
> On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch
>  >
> wrote:
>
> > Is the 100 words a hard boundary or a soft one?
> >
> > If it is a hard one (always 100 words), the easiest is probably copy
> > field and in the (unstored) copy, trim off whatever you don't want
> > to search. Possibly using regular expressions. Of course, "what's a word"
> > is an important question here.
> >
> > Similarly, you could do that with Update Request Processors and
> > clone/process field even before it hits the schema. Then you could
> > store the extract for highlighting purposes.
> >
> > Regards,
> >Alex.
> >
> > On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi
> > 
> > wrote:
> > >
> > > Hi,
> > > What's the recommended way to search in Solr (assuming 8.2 is
> > > used) for
> > specific terms/phrases/expressions while limiting the search from
> > position perspective.
> > > For example to search only in the first/last 100 words of the
> > > document
> ?
> > >
> > > Is there any built-in functionality for that ?
> > >
> > > Thanks in advance,
> > > Adi
> > >
> > >
> > > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you
> > may not use, copy, disclose or distribute to anyone this message or
> > any information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or
> subsidiaries. The information is intended to be for the use of the
> individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may
> not use, copy, disclose or distribute to anyone this message or any
> information contained in this message. If you have received this
> electronic message in error, please notify u

Re: Position search

2019-10-16 Thread Alexandre Rafalovitch
So are these really text locations or rather actually sections of the
document. If later, can you parse out sections during indexing?

Regards,
 Alex

On Wed, Oct 16, 2019, 3:57 AM Kaminski, Adi, 
wrote:

> Hi,
> Thanks for the responses.
>
> It's a soft boundary which is resulted by dynamic syntax from our
> application. So may vary from different user searches, one user can search
> some "word1" in starting 30 words, and another can search "word2" in
> starting 10 words. The use case is to match some terms/phrase in specific
> document places in order to identify scripts/specific word ocuurences.
>
> So I guess copy field won't work here.
>
> Any other suggestions/thoughts ?
> Maybe some hidden position filters in native level to limit from start/end
> of the document ?
>
> Thanks,
> Adi
>
> -Original Message-
> From: Tim Casey 
> Sent: Tuesday, October 15, 2019 11:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Position search
>
> If this is about a normalized query, I would put the normalization text
> into a specific field.  The reason for this is you may want to search the
> overall text during any form of expansion phase of searching for data.
> That is, maybe you want to know the context of up to the 120th word.  At
> least you have both.
> Also, you may want to note which normalized fields were truncated or were
> simply too small. This would give some guidance as to the bias of the
> normalization.  If 95% of the fields were not truncated, there is a chance
> you are not doing good at normalizing because you have a set of
> particularly short messages.  So I would expect a small set of side fields
> remarking this.  This would allow you to carry the measures along with the
> data.
>
> tim
>
> On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch  >
> wrote:
>
> > Is the 100 words a hard boundary or a soft one?
> >
> > If it is a hard one (always 100 words), the easiest is probably copy
> > field and in the (unstored) copy, trim off whatever you don't want to
> > search. Possibly using regular expressions. Of course, "what's a word"
> > is an important question here.
> >
> > Similarly, you could do that with Update Request Processors and
> > clone/process field even before it hits the schema. Then you could
> > store the extract for highlighting purposes.
> >
> > Regards,
> >Alex.
> >
> > On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi 
> > wrote:
> > >
> > > Hi,
> > > What's the recommended way to search in Solr (assuming 8.2 is used)
> > > for
> > specific terms/phrases/expressions while limiting the search from
> > position perspective.
> > > For example to search only in the first/last 100 words of the document
> ?
> > >
> > > Is there any built-in functionality for that ?
> > >
> > > Thanks in advance,
> > > Adi
> > >
> > >
> > > This electronic message may contain proprietary and confidential
> > information of Verint Systems Inc., its affiliates and/or
> > subsidiaries. The information is intended to be for the use of the
> > individual(s) or
> > entity(ies) named above. If you are not the intended recipient (or
> > authorized to receive this e-mail for the intended recipient), you may
> > not use, copy, disclose or distribute to anyone this message or any
> > information contained in this message. If you have received this
> > electronic message in error, please notify us by replying to this e-mail.
> >
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


RE: Position search

2019-10-16 Thread Kaminski, Adi
Hi,
Thanks for the responses.

It's a soft boundary which is resulted by dynamic syntax from our application. 
So may vary from different user searches, one user can search some "word1" in 
starting 30 words, and another can search "word2" in
starting 10 words. The use case is to match some terms/phrase in specific 
document places in order to identify scripts/specific word ocuurences.

So I guess copy field won't work here.

Any other suggestions/thoughts ?
Maybe some hidden position filters in native level to limit from start/end of 
the document ?

Thanks,
Adi

-Original Message-
From: Tim Casey 
Sent: Tuesday, October 15, 2019 11:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Position search

If this is about a normalized query, I would put the normalization text into a 
specific field.  The reason for this is you may want to search the overall text 
during any form of expansion phase of searching for data.
That is, maybe you want to know the context of up to the 120th word.  At least 
you have both.
Also, you may want to note which normalized fields were truncated or were 
simply too small. This would give some guidance as to the bias of the 
normalization.  If 95% of the fields were not truncated, there is a chance you 
are not doing good at normalizing because you have a set of particularly short 
messages.  So I would expect a small set of side fields remarking this.  This 
would allow you to carry the measures along with the data.

tim

On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch 
wrote:

> Is the 100 words a hard boundary or a soft one?
>
> If it is a hard one (always 100 words), the easiest is probably copy
> field and in the (unstored) copy, trim off whatever you don't want to
> search. Possibly using regular expressions. Of course, "what's a word"
> is an important question here.
>
> Similarly, you could do that with Update Request Processors and
> clone/process field even before it hits the schema. Then you could
> store the extract for highlighting purposes.
>
> Regards,
>Alex.
>
> On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi 
> wrote:
> >
> > Hi,
> > What's the recommended way to search in Solr (assuming 8.2 is used)
> > for
> specific terms/phrases/expressions while limiting the search from
> position perspective.
> > For example to search only in the first/last 100 words of the document ?
> >
> > Is there any built-in functionality for that ?
> >
> > Thanks in advance,
> > Adi
> >
> >
> > This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or
> subsidiaries. The information is intended to be for the use of the
> individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may
> not use, copy, disclose or distribute to anyone this message or any
> information contained in this message. If you have received this
> electronic message in error, please notify us by replying to this e-mail.
>


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Position search

2019-10-15 Thread Tim Casey
If this is about a normalized query, I would put the normalization text
into a specific field.  The reason for this is you may want to search the
overall text during any form of expansion phase of searching for data.
That is, maybe you want to know the context of up to the 120th word.  At
least you have both.
Also, you may want to note which normalized fields were truncated or were
simply too small. This would give some guidance as to the bias of the
normalization.  If 95% of the fields were not truncated, there is a chance
you are not doing good at normalizing because you have a set of
particularly short messages.  So I would expect a small set of side fields
remarking this.  This would allow you to carry the measures along with the
data.

tim

On Tue, Oct 15, 2019 at 12:19 PM Alexandre Rafalovitch 
wrote:

> Is the 100 words a hard boundary or a soft one?
>
> If it is a hard one (always 100 words), the easiest is probably copy
> field and in the (unstored) copy, trim off whatever you don't want to
> search. Possibly using regular expressions. Of course, "what's a word"
> is an important question here.
>
> Similarly, you could do that with Update Request Processors and
> clone/process field even before it hits the schema. Then you could
> store the extract for highlighting purposes.
>
> Regards,
>Alex.
>
> On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi 
> wrote:
> >
> > Hi,
> > What's the recommended way to search in Solr (assuming 8.2 is used) for
> specific terms/phrases/expressions while limiting the search from position
> perspective.
> > For example to search only in the first/last 100 words of the document ?
> >
> > Is there any built-in functionality for that ?
> >
> > Thanks in advance,
> > Adi
> >
> >
> > This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


Re: Position search

2019-10-15 Thread Alexandre Rafalovitch
Is the 100 words a hard boundary or a soft one?

If it is a hard one (always 100 words), the easiest is probably copy
field and in the (unstored) copy, trim off whatever you don't want to
search. Possibly using regular expressions. Of course, "what's a word"
is an important question here.

Similarly, you could do that with Update Request Processors and
clone/process field even before it hits the schema. Then you could
store the extract for highlighting purposes.

Regards,
   Alex.

On Tue, 15 Oct 2019 at 02:25, Kaminski, Adi  wrote:
>
> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for 
> specific terms/phrases/expressions while limiting the search from position 
> perspective.
> For example to search only in the first/last 100 words of the document ?
>
> Is there any built-in functionality for that ?
>
> Thanks in advance,
> Adi
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.


RE: Position search

2019-10-15 Thread Markus Jelsma
Hello Adi,

There is no SpanLastQuery or equivalent. But you could reverse the text and use 
SpanFirstQuery. Or, perhaps easier, add a bogus term to the end of the field 
and use PhraseQuery.

Regards,
Markus
 
-Original message-
> From:Kaminski, Adi 
> Sent: Tuesday 15th October 2019 10:57
> To: solr-user@lucene.apache.org
> Subject: RE: Position search
> 
> Hi Markus,
> Thanks for the guidance.
> 
> Is there any official Solr documentation for that ? Tried some googling, only 
> some Stackoverflow / Lucene posts are available.
> 
> Also, will that approach work for the other use case of searching from end of 
> documents ?
> For example if I need to perform some term search from the end, e.g. "book" 
> in the last 30 or 100 words.
> 
> Is there SpanLastQuery ?
> 
> Thanks,
> Adi
> 
> -Original Message-
> From: Markus Jelsma 
> Sent: Tuesday, October 15, 2019 11:04 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Position search
> 
> Hello Adi,
> 
> Try SpanFirstQuery. It limits the search to within the Nth term in the field.
> 
> Regards,
> Markus
> 
> 
> 
> -Original message-
> > From:Kaminski, Adi 
> > Sent: Tuesday 15th October 2019 8:25
> > To: solr-user@lucene.apache.org
> > Subject: Position search
> >
> > Hi,
> > What's the recommended way to search in Solr (assuming 8.2 is used) for 
> > specific terms/phrases/expressions while limiting the search from position 
> > perspective.
> > For example to search only in the first/last 100 words of the document ?
> >
> > Is there any built-in functionality for that ?
> >
> > Thanks in advance,
> > Adi
> >
> >
> > This electronic message may contain proprietary and confidential 
> > information of Verint Systems Inc., its affiliates and/or subsidiaries. The 
> > information is intended to be for the use of the individual(s) or 
> > entity(ies) named above. If you are not the intended recipient (or 
> > authorized to receive this e-mail for the intended recipient), you may not 
> > use, copy, disclose or distribute to anyone this message or any information 
> > contained in this message. If you have received this electronic message in 
> > error, please notify us by replying to this e-mail.
> >
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.
> 


RE: Position search

2019-10-15 Thread Kaminski, Adi
Hi Markus,
Thanks for the guidance.

Is there any official Solr documentation for that ? Tried some googling, only 
some Stackoverflow / Lucene posts are available.

Also, will that approach work for the other use case of searching from end of 
documents ?
For example if I need to perform some term search from the end, e.g. "book" in 
the last 30 or 100 words.

Is there SpanLastQuery ?

Thanks,
Adi

-Original Message-
From: Markus Jelsma 
Sent: Tuesday, October 15, 2019 11:04 AM
To: solr-user@lucene.apache.org
Subject: RE: Position search

Hello Adi,

Try SpanFirstQuery. It limits the search to within the Nth term in the field.

Regards,
Markus



-Original message-
> From:Kaminski, Adi 
> Sent: Tuesday 15th October 2019 8:25
> To: solr-user@lucene.apache.org
> Subject: Position search
>
> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for 
> specific terms/phrases/expressions while limiting the search from position 
> perspective.
> For example to search only in the first/last 100 words of the document ?
>
> Is there any built-in functionality for that ?
>
> Thanks in advance,
> Adi
>
>
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.
>


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


RE: Position search

2019-10-15 Thread Markus Jelsma
Hello Adi,

Try SpanFirstQuery. It limits the search to within the Nth term in the field.

Regards,
Markus

 
 
-Original message-
> From:Kaminski, Adi 
> Sent: Tuesday 15th October 2019 8:25
> To: solr-user@lucene.apache.org
> Subject: Position search
> 
> Hi,
> What's the recommended way to search in Solr (assuming 8.2 is used) for 
> specific terms/phrases/expressions while limiting the search from position 
> perspective.
> For example to search only in the first/last 100 words of the document ?
> 
> Is there any built-in functionality for that ?
> 
> Thanks in advance,
> Adi
> 
> 
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries. The information 
> is intended to be for the use of the individual(s) or entity(ies) named 
> above. If you are not the intended recipient (or authorized to receive this 
> e-mail for the intended recipient), you may not use, copy, disclose or 
> distribute to anyone this message or any information contained in this 
> message. If you have received this electronic message in error, please notify 
> us by replying to this e-mail.
>