Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-03-14 Thread Ivan Pavlukhin
Yuriy,

> Let me summarize the approaches:
I agree with your reasoning, p.2 sounds the best one to me as well.

Will look into merge-sort strategy some time later.

Best regards,
Ivan Pavlukhin

пт, 13 мар. 2020 г. в 19:23, Yuriy Shuliga :
>
> Ivan,
>
> I have made changes in the fork that reflects merge-sort strategy and now
> query future iterator unblocks as soon all first pages are delivered from
> nodes; then it waits for the next pages portions and so on.
> https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd
>
> Please validate the design if you wish.
>
> Regarding ranking field in the entity.
>
> Entities for text queries in search domain are usually treated as
> documents with some metadata.
> This can be an id, issued/expired date, and document score returned for
> given query.
> It is common to include such fields in entity design.
>
> Answer to your question about omitting QueryRankField:
> - Then the response records just will come in arbitrary order. This
> should not fail TextQuery execution.
>
> Another point about rank value among different indices.
> - ranks are to be used for comparison between entities in praticular query
> response, they are not intended to be absolute over the system.
>
> Let me summarize the approaches:
> 1. Subclassing from Ranked.class.
>  pros: the simplest and ignite-natural approach
> cons: implicit nature, limits entity inheritance
>
> 2. Explicitly Introducing dedicated field  annotated  @QueryRankField
> pros:  ignite-natural approach, easy to introduce, explicitly controlled by
> developer
> cons: adds extra metadata to entity
>
> 3. Wrapping entity response with rank data, used for merge sort, not
> exposing it to client.
> pros: leaves entity design clean
> cons: rank is not available for client, development will require complex
> change in query execution / entity marshaling mechanisms
>
> I'd stay on p.2 as most balanced solution of these.
> What do you think?
>
> BR,
> Yuriy Shuliha
>
>
>
>
> ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin  пише:
>
> > Igniters,
> >
> > Not intentionally the discussion continued outside of dev list. I am
> > returning it back. You can find it below. Do not hesitate to join if you
> > have some thoughts on raised questions. May be you have ideas how to enrich
> > text query results with score/rank information.
> >
> > вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga :
> >
> > > Yes, please do.
> > >
> > > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin 
> > > пише:
> > >
> > >> Yuriy,
> > >>
> > >> I noticed that from some point our discussion moved out of Ignite dev
> > >> list. Would you mind if I return it back to dev list?
> > >>
> > >> Best regards,
> > >> Ivan Pavlukhin
> > >>
> > >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin :
> > >> >
> > >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> > >> What will be the next version/date we can aim on with this update?
> > >> >
> > >> > Yes, 2.8 is already available and the community is working on
> > >> finalizing activities (e.g. publishing documentation). I do not have any
> > >> reliable expectations about next releases. I suppose that there could
> > be a
> > >> couple of maintenance releases like 2.8.1 as several problems were
> > already
> > >> discovered. I do not know whether next more significant release is
> > going to
> > >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> > >> because there are already several "almost ready" features in master. In
> > my
> > >> mind it is a good idea to start a discussion about next releases on dev
> > >> list.
> > >> >
> > >> > Best regards,
> > >> > Ivan Pavlukhin
> > >> >
> > >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin :
> > >> > >
> > >> > > Hi Yuriy,
> > >> > >
> > >> > > Sorry for a late response.
> > >> > >
> > >> > > > Suitable solution without subclassing might be:
> > >> > > > 1. Explicitly add float field to entity
> > >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> > >> to initiating node
> > >> > > > 4. Possibly still need to proxify entity with adding Comparable
> > >> interface.
> > >> > > > 5. Perform merge sort on initiating node
> > >> > >
> > >> > > Possibly I missed it but one moment is not clear for me. What will
> > >> > > happen if an entity class does not have a field annotated with
> > >> > > QueryRankField?
> > >> > >
> > >> > > And I am still not sure that it is a proper (enough) approach. The
> > >> > > thing which bothers me is a transient and dynamic nature of "rank"
> > >> > > field. It does belong to entity, it can have different values for
> > the
> > >> > > same entity (e.g. different indices are used).
> > >> > >
> > >> > > I would like to experiment with a code a little bit. But most
> > likely I
> > >> > > will have a chance only at the end of this week.
> > >> > >
> > >> > > Best regards,
> > >> > > Ivan Pavlukhin
> > 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-03-13 Thread Yuriy Shuliga
Ivan,

I have made changes in the fork that reflects merge-sort strategy and now
query future iterator unblocks as soon all first pages are delivered from
nodes; then it waits for the next pages portions and so on.
https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd

Please validate the design if you wish.

Regarding ranking field in the entity.

Entities for text queries in search domain are usually treated as
documents with some metadata.
This can be an id, issued/expired date, and document score returned for
given query.
It is common to include such fields in entity design.

Answer to your question about omitting QueryRankField:
- Then the response records just will come in arbitrary order. This
should not fail TextQuery execution.

Another point about rank value among different indices.
- ranks are to be used for comparison between entities in praticular query
response, they are not intended to be absolute over the system.

Let me summarize the approaches:
1. Subclassing from Ranked.class.
 pros: the simplest and ignite-natural approach
cons: implicit nature, limits entity inheritance

2. Explicitly Introducing dedicated field  annotated  @QueryRankField
pros:  ignite-natural approach, easy to introduce, explicitly controlled by
developer
cons: adds extra metadata to entity

3. Wrapping entity response with rank data, used for merge sort, not
exposing it to client.
pros: leaves entity design clean
cons: rank is not available for client, development will require complex
change in query execution / entity marshaling mechanisms

I'd stay on p.2 as most balanced solution of these.
What do you think?

BR,
Yuriy Shuliha




ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin  пише:

> Igniters,
>
> Not intentionally the discussion continued outside of dev list. I am
> returning it back. You can find it below. Do not hesitate to join if you
> have some thoughts on raised questions. May be you have ideas how to enrich
> text query results with score/rank information.
>
> вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga :
>
> > Yes, please do.
> >
> > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin 
> > пише:
> >
> >> Yuriy,
> >>
> >> I noticed that from some point our discussion moved out of Ignite dev
> >> list. Would you mind if I return it back to dev list?
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >>
> >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin :
> >> >
> >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> >> What will be the next version/date we can aim on with this update?
> >> >
> >> > Yes, 2.8 is already available and the community is working on
> >> finalizing activities (e.g. publishing documentation). I do not have any
> >> reliable expectations about next releases. I suppose that there could
> be a
> >> couple of maintenance releases like 2.8.1 as several problems were
> already
> >> discovered. I do not know whether next more significant release is
> going to
> >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> >> because there are already several "almost ready" features in master. In
> my
> >> mind it is a good idea to start a discussion about next releases on dev
> >> list.
> >> >
> >> > Best regards,
> >> > Ivan Pavlukhin
> >> >
> >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin :
> >> > >
> >> > > Hi Yuriy,
> >> > >
> >> > > Sorry for a late response.
> >> > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> >> to initiating node
> >> > > > 4. Possibly still need to proxify entity with adding Comparable
> >> interface.
> >> > > > 5. Perform merge sort on initiating node
> >> > >
> >> > > Possibly I missed it but one moment is not clear for me. What will
> >> > > happen if an entity class does not have a field annotated with
> >> > > QueryRankField?
> >> > >
> >> > > And I am still not sure that it is a proper (enough) approach. The
> >> > > thing which bothers me is a transient and dynamic nature of "rank"
> >> > > field. It does belong to entity, it can have different values for
> the
> >> > > same entity (e.g. different indices are used).
> >> > >
> >> > > I would like to experiment with a code a little bit. But most
> likely I
> >> > > will have a chance only at the end of this week.
> >> > >
> >> > > Best regards,
> >> > > Ivan Pavlukhin
> >> > >
> >> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga :
> >> > > >
> >> > > > Hi Ivan,
> >> > > >
> >> > > > Have concerns about entity annotation variant.
> >> > > > Wrapping into dynamic proxy for passing back, will be quite a
> >> complex thing that requires changes in IgniteCacheObjectProcessor
> >> > > > and entity marshaling.
> >> > > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-03-10 Thread Ivan Pavlukhin
Igniters,

Not intentionally the discussion continued outside of dev list. I am
returning it back. You can find it below. Do not hesitate to join if you
have some thoughts on raised questions. May be you have ideas how to enrich
text query results with score/rank information.

вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga :

> Yes, please do.
>
> вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin 
> пише:
>
>> Yuriy,
>>
>> I noticed that from some point our discussion moved out of Ignite dev
>> list. Would you mind if I return it back to dev list?
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin :
>> >
>> > > PS As far as i see, the are no chance to get on 2.8 release train.
>> What will be the next version/date we can aim on with this update?
>> >
>> > Yes, 2.8 is already available and the community is working on
>> finalizing activities (e.g. publishing documentation). I do not have any
>> reliable expectations about next releases. I suppose that there could be a
>> couple of maintenance releases like 2.8.1 as several problems were already
>> discovered. I do not know whether next more significant release is going to
>> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
>> because there are already several "almost ready" features in master. In my
>> mind it is a good idea to start a discussion about next releases on dev
>> list.
>> >
>> > Best regards,
>> > Ivan Pavlukhin
>> >
>> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin :
>> > >
>> > > Hi Yuriy,
>> > >
>> > > Sorry for a late response.
>> > >
>> > > > Suitable solution without subclassing might be:
>> > > > 1. Explicitly add float field to entity
>> > > > 2. Annotate it with special @QueryRankField, (for instance)
>> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
>> to initiating node
>> > > > 4. Possibly still need to proxify entity with adding Comparable
>> interface.
>> > > > 5. Perform merge sort on initiating node
>> > >
>> > > Possibly I missed it but one moment is not clear for me. What will
>> > > happen if an entity class does not have a field annotated with
>> > > QueryRankField?
>> > >
>> > > And I am still not sure that it is a proper (enough) approach. The
>> > > thing which bothers me is a transient and dynamic nature of "rank"
>> > > field. It does belong to entity, it can have different values for the
>> > > same entity (e.g. different indices are used).
>> > >
>> > > I would like to experiment with a code a little bit. But most likely I
>> > > will have a chance only at the end of this week.
>> > >
>> > > Best regards,
>> > > Ivan Pavlukhin
>> > >
>> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga :
>> > > >
>> > > > Hi Ivan,
>> > > >
>> > > > Have concerns about entity annotation variant.
>> > > > Wrapping into dynamic proxy for passing back, will be quite a
>> complex thing that requires changes in IgniteCacheObjectProcessor
>> > > > and entity marshaling.
>> > > >
>> > > > Suitable solution without subclassing might be:
>> > > > 1. Explicitly add float field to entity
>> > > > 2. Annotate it with special @QueryRankField, (for instance)
>> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
>> to initiating node
>> > > > 4. Possibly still need to proxify entity with adding Comparable
>> interface.
>> > > > 5. Perform merge sort on initiating node
>> > > >
>> > > > Would you consider this approach or return back to using Ranked
>> superclass?
>> > > >
>> > > > Regarding your proposal to implement megre sort - definitely yes.
>> > > > I will implement this.
>> > > > Sorry, didn't understand you earlier )
>> > > >
>> > > > BR,
>> > > > Yuriy Shuliha
>> > > >
>> > > > PS As far as i see, the are no chance to get on 2.8 release train.
>> What will be the next version/date we can aim on with this update?
>> > > >
>> > > >
>> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin  пише:
>> > > >>
>> > > >> Hi Yuriy,
>> > > >>
>> > > >> Sorry for a late response and thank you for your comments.
>> > > >>
>> > > >> Approach with @Ranked annotation looks cleaner to me from API
>> point of view.
>> > > >>
>> > > >> Regarding merging responses from multiple nodes I suppose that good
>> > > >> enough solution is possible:
>> > > >> 1. Request one page of entries from each node.
>> > > >> 2. Return one page to a user (as there is definitely a page of the
>> > > >> best results already).
>> > > >> 3. Request next result pages from nodes corresponding to pages we
>> > > >> exposed to the user (actually nodes having lesser than 1 page of
>> > > >> pending results). Repeat from step 2.
>> > > >>
>> > > >> Some kind of sort merge plus backpressure. Backpressure part might
>> be
>> > > >> left as an improvement.
>> > > >>
>> > > >> What do you think?
>> > > >>
>> > > >> Best regards,
>> > > >> Ivan Pavlukhin
>> > > >>
>> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga :
>> > > >>
>> > > >> >
>> > > >> > Hi Ivan,
>> > > >> >
>> > > >> > Thank you for keeping 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-01-23 Thread Yuriy Shuliga
Hi Ivan,

Actually I have engaged another developer to help bring TextQueries to
correctly working state.
For now we have solution that adds Ordering functionality to distributed
TextQueries .
This is developed and tested locally. I can share details here, then we can
discuss and decide whether to create a corresponding ticket.

The starting point is that by nature Lucene's documents are always ordered
by docScore:float;
So we created abstract class Ranked, implementing Comparable and
Serializable; and containing float rank value;

Each entity expected to be ordered on TextQuery merge should be
derived from this class.
All subsequent actions will be done under the hood automatically due
to new CacheQueryFutureRankedDecorator

that contain special BlockingIterator used for correct merge of distributed
responses.
Text queries with Ranked entities are automatically wrapped with this new
decorator.

This is a contour of solution. Please ask if any questions.
Or i can create ticket and link PR with already tested (yet locally)
solution to it for detailed review.

BR,
Yuriy


вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin  пише:

> Hi Yuriy,
>
> Just would like to realize current state. Are you still working on
> Ignite text queries? If not, are you going to continue with it?
>
> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin :
> >
> > Yuriy,
> >
> > Sure, I will be glad to help.
> >
> > > - incorrect nodes/partition selection during querying?
> > Apparently this is the problem. If you feel it really complicated to
> > understand and debug then I can dig deeper and share my vision how the
> > problem can be fixed.
> >
> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga :
> > >
> > > I will look to the MOVING partition issue.
> > > But also need a guidance there.
> > >
> > > Ivan, don't you mind to be that person?
> > >
> > > The question is whether we have an issue with:
> > > -  wrong storing targets during indexing OR
> > > - incorrect nodes/partition selection during querying?
> > >
> > > BR,
> > > Yuriy Shluiha
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-01-20 Thread Ivan Pavlukhin
Hi Yuriy,

Just would like to realize current state. Are you still working on
Ignite text queries? If not, are you going to continue with it?

пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin :
>
> Yuriy,
>
> Sure, I will be glad to help.
>
> > - incorrect nodes/partition selection during querying?
> Apparently this is the problem. If you feel it really complicated to
> understand and debug then I can dig deeper and share my vision how the
> problem can be fixed.
>
> ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga :
> >
> > I will look to the MOVING partition issue.
> > But also need a guidance there.
> >
> > Ivan, don't you mind to be that person?
> >
> > The question is whether we have an issue with:
> > -  wrong storing targets during indexing OR
> > - incorrect nodes/partition selection during querying?
> >
> > BR,
> > Yuriy Shluiha
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-13 Thread Ivan Pavlukhin
Yuriy,

Sure, I will be glad to help.

> - incorrect nodes/partition selection during querying?
Apparently this is the problem. If you feel it really complicated to
understand and debug then I can dig deeper and share my vision how the
problem can be fixed.

ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga :
>
> I will look to the MOVING partition issue.
> But also need a guidance there.
>
> Ivan, don't you mind to be that person?
>
> The question is whether we have an issue with:
> -  wrong storing targets during indexing OR
> - incorrect nodes/partition selection during querying?
>
> BR,
> Yuriy Shluiha
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-11 Thread Yuriy Shuliga
I will look to the MOVING partition issue.
But also need a guidance there. 

Ivan, don't you mind to be that person?

The question is whether we have an issue with:
-  wrong storing targets during indexing OR 
- incorrect nodes/partition selection during querying?

BR,
Yuriy Shluiha



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-10 Thread Ilya Kasnacheev
Hello!

Yes, I guess you are right :(

I can surely fix the range issue, It's just that it was so broken that I
could not figure the correct behavior for this case.

Regards,
-- 
Ilya Kasnacheev


пн, 2 дек. 2019 г. в 15:01, Ivan Pavlukhin :

> Ilya,
>
> I checked your test on a revision before "limit" and it fails there as
> well. Could you please double check?
>
> пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev :
> >
> > Hello!
> >
> > The problem is NOT specific to range queries. Range queries were broken
> > previously and they are broken now, but now even a simple "token in field
> > with limit" returns duplicates.
> >
> > Before limits were introduced, any tested use cases were unaffected by
> > duplicates, but now they are.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin :
> >
> > > And is the problem specific to range queries or not?
> > >
> > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin :
> > > >
> > > > Yuriy,
> > > >
> > > > Thank you for investigating the problem [1]. Still cannot realize how
> > > > the problem relates to introduced "limit"? Is it right that there
> were
> > > > no duplicates before "limit" support? After that support is
> introduced
> > > > are only limited queries contain duplicates, or unlimited, or both?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > > >
> > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com
> > > >:
> > > > >
> > > > > Hello!
> > > > >
> > > > > I have just found what I consider a major regression in Text
> Queries:
> > > it
> > > > > seems to me that text queries with limits will return same
> key-value
> > > > > entries multiple times.
> > > > >
> > > > > Please check the issue
> > > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > and corresponding build
> > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Ilya Kasnacheev
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-03 Thread Ivan Pavlukhin
*on topologies

вт, 3 дек. 2019 г. в 17:15, Ivan Pavlukhin :
>
> Ilya, Yuriy,
>
> It seems that text queries can return incorrect results on tologies
> with MOVING partitions. I left a comment in JIRA [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12401
>
> пн, 2 дек. 2019 г. в 15:00, Ivan Pavlukhin :
> >
> > Ilya,
> >
> > I checked your test on a revision before "limit" and it fails there as
> > well. Could you please double check?
> >
> > пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev :
> > >
> > > Hello!
> > >
> > > The problem is NOT specific to range queries. Range queries were broken
> > > previously and they are broken now, but now even a simple "token in field
> > > with limit" returns duplicates.
> > >
> > > Before limits were introduced, any tested use cases were unaffected by
> > > duplicates, but now they are.
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin :
> > >
> > > > And is the problem specific to range queries or not?
> > > >
> > > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin :
> > > > >
> > > > > Yuriy,
> > > > >
> > > > > Thank you for investigating the problem [1]. Still cannot realize how
> > > > > the problem relates to introduced "limit"? Is it right that there were
> > > > > no duplicates before "limit" support? After that support is introduced
> > > > > are only limited queries contain duplicates, or unlimited, or both?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > > > >
> > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev 
> > > > >  > > > >:
> > > > > >
> > > > > > Hello!
> > > > > >
> > > > > > I have just found what I consider a major regression in Text 
> > > > > > Queries:
> > > > it
> > > > > > seems to me that text queries with limits will return same key-value
> > > > > > entries multiple times.
> > > > > >
> > > > > > Please check the issue
> > > > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > > and corresponding build
> > > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > > > >
> > > > > > Regards,
> > > > > > --
> > > > > > Ilya Kasnacheev
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-03 Thread Ivan Pavlukhin
Ilya, Yuriy,

It seems that text queries can return incorrect results on tologies
with MOVING partitions. I left a comment in JIRA [1].

[1] https://issues.apache.org/jira/browse/IGNITE-12401

пн, 2 дек. 2019 г. в 15:00, Ivan Pavlukhin :
>
> Ilya,
>
> I checked your test on a revision before "limit" and it fails there as
> well. Could you please double check?
>
> пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev :
> >
> > Hello!
> >
> > The problem is NOT specific to range queries. Range queries were broken
> > previously and they are broken now, but now even a simple "token in field
> > with limit" returns duplicates.
> >
> > Before limits were introduced, any tested use cases were unaffected by
> > duplicates, but now they are.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin :
> >
> > > And is the problem specific to range queries or not?
> > >
> > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin :
> > > >
> > > > Yuriy,
> > > >
> > > > Thank you for investigating the problem [1]. Still cannot realize how
> > > > the problem relates to introduced "limit"? Is it right that there were
> > > > no duplicates before "limit" support? After that support is introduced
> > > > are only limited queries contain duplicates, or unlimited, or both?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > > >
> > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev  > > >:
> > > > >
> > > > > Hello!
> > > > >
> > > > > I have just found what I consider a major regression in Text Queries:
> > > it
> > > > > seems to me that text queries with limits will return same key-value
> > > > > entries multiple times.
> > > > >
> > > > > Please check the issue
> > > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > and corresponding build
> > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Ilya Kasnacheev
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-02 Thread Ivan Pavlukhin
Ilya,

I checked your test on a revision before "limit" and it fails there as
well. Could you please double check?

пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev :
>
> Hello!
>
> The problem is NOT specific to range queries. Range queries were broken
> previously and they are broken now, but now even a simple "token in field
> with limit" returns duplicates.
>
> Before limits were introduced, any tested use cases were unaffected by
> duplicates, but now they are.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin :
>
> > And is the problem specific to range queries or not?
> >
> > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin :
> > >
> > > Yuriy,
> > >
> > > Thank you for investigating the problem [1]. Still cannot realize how
> > > the problem relates to introduced "limit"? Is it right that there were
> > > no duplicates before "limit" support? After that support is introduced
> > > are only limited queries contain duplicates, or unlimited, or both?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > >
> > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev  > >:
> > > >
> > > > Hello!
> > > >
> > > > I have just found what I consider a major regression in Text Queries:
> > it
> > > > seems to me that text queries with limits will return same key-value
> > > > entries multiple times.
> > > >
> > > > Please check the issue
> > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > and corresponding build
> > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-02 Thread Ilya Kasnacheev
Hello!

The problem is NOT specific to range queries. Range queries were broken
previously and they are broken now, but now even a simple "token in field
with limit" returns duplicates.

Before limits were introduced, any tested use cases were unaffected by
duplicates, but now they are.

Regards,
-- 
Ilya Kasnacheev


пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin :

> And is the problem specific to range queries or not?
>
> пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin :
> >
> > Yuriy,
> >
> > Thank you for investigating the problem [1]. Still cannot realize how
> > the problem relates to introduced "limit"? Is it right that there were
> > no duplicates before "limit" support? After that support is introduced
> > are only limited queries contain duplicates, or unlimited, or both?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> >
> > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev  >:
> > >
> > > Hello!
> > >
> > > I have just found what I consider a major regression in Text Queries:
> it
> > > seems to me that text queries with limits will return same key-value
> > > entries multiple times.
> > >
> > > Please check the issue
> https://issues.apache.org/jira/browse/IGNITE-12401
> > > and corresponding build
> > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-02 Thread Ivan Pavlukhin
And is the problem specific to range queries or not?

пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin :
>
> Yuriy,
>
> Thank you for investigating the problem [1]. Still cannot realize how
> the problem relates to introduced "limit"? Is it right that there were
> no duplicates before "limit" support? After that support is introduced
> are only limited queries contain duplicates, or unlimited, or both?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12401
>
> чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev :
> >
> > Hello!
> >
> > I have just found what I consider a major regression in Text Queries: it
> > seems to me that text queries with limits will return same key-value
> > entries multiple times.
> >
> > Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401
> > and corresponding build
> > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> >
> > Regards,
> > --
> > Ilya Kasnacheev
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-02 Thread Ivan Pavlukhin
Yuriy,

Thank you for investigating the problem [1]. Still cannot realize how
the problem relates to introduced "limit"? Is it right that there were
no duplicates before "limit" support? After that support is introduced
are only limited queries contain duplicates, or unlimited, or both?

[1] https://issues.apache.org/jira/browse/IGNITE-12401

чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev :
>
> Hello!
>
> I have just found what I consider a major regression in Text Queries: it
> seems to me that text queries with limits will return same key-value
> entries multiple times.
>
> Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401
> and corresponding build
> https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
>
> Regards,
> --
> Ilya Kasnacheev



-- 
Best regards,
Ivan Pavlukhin


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-28 Thread Ilya Kasnacheev
Hello!

I have just found what I consider a major regression in Text Queries: it
seems to me that text queries with limits will return same key-value
entries multiple times.

Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401
and corresponding build
https://ci.ignite.apache.org/viewQueued.html?itemId=4799634

Regards,
-- 
Ilya Kasnacheev


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-28 Thread Yuriy Shuliga
Nice to hear, Ivan

It's good practice to make existing functionality extension to be proper
presented; as we expect if from Text Queries.
Lets make it work correctly at first.

I'm ok to prepare ticket for adding reduction for sorted responses to
GridCacheDistributedQueryFuture  or nearby.
Also theTextQuery response entity will be extended to carry Lucene's
'docScore' per record.
No open question has left then.

BR,
Yuriy Shuliha

чт, 28 лист. 2019 о 15:23 Ivan Pavlukhin  пише:

> Folks, Yuriy,
>
> I suppose that we are going to proceed with
>
> >>>
> Reducing on Ignite
>
> The obvious point of distributed response reduction is class
> GridCacheDistributedQueryFuture.
> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> ReduceIndexSorted
> What I see here, that it is tangled with H2 related classes
> (org.h2.result.Row) and might not be unified with TextQuery reduction.
> >>
>
> From my side there is no strict opinion that we should unify
> reduction. Having a separate reduction implementation for text queries
> sounds for me as not bad option as well.
>
> Are there still any open questions?
>
> ср, 27 нояб. 2019 г. в 02:27, Denis Magda :
> >
> > I don't see anything wrong if Yuriy is willing to carry on and keep
> > enhancing our full-text search support that lacks basic capabilities.
> >
> > The basics should be available. If anybody needs an advanced feature they
> > can introduce Solr or ElastiSearch into the final architecture of the
> app.
> >
> > Folks, who of us can help Yuriy with the questions asked? Most like the
> SQL
> > experts are the best candidates here.
> >
> >
> > -
> > Denis
> >
> >
> > On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin 
> wrote:
> >
> > > Folks,
> > >
> > > IEP is an Ignite-specific thing. In fact, I suppose that we are
> > > already doing it in ASF way by having this dev-list discussion =)
> > >
> > > As for me, implementing "limit" feature for text queries is not so big
> > > to make an IEP. But we might need to create one for next features.
> > >
> > > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>:
> > > >
> > > > Hello!
> > > >
> > > > ASF way should probably start with an IEP :)
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
> > >  > > > >:
> > > >
> > > > >
> > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > > > functionality is helpful and PR it, why not ?
> > > > >
> > > > > isn`t it ?
> > > > >
> > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > > > ilya.kasnach...@gmail.com>:
> > > > > >
> > > > > >Hello!
> > > > > >
> > > > > >The problem here is that Solr is a multi-year effort by a lot of
> > > people.
> > > > > We
> > > > > >can't match that.
> > > > > >
> > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding
> our
> > > > > cache
> > > > > >information into their storage for indexing and relying on their
> own
> > > > > >mechanisms for distributed IR sorting?
> > > > > >
> > > > > >Regards,
> > > > > >--
> > > > > >Ilya Kasnacheev
> > > > > >
> > > > > >
> > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > > > arzamas...@mail.ru.invalid
> > > > > >>:
> > > > > >
> > > > > >>
> > > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite
> functionality ?
> > > > > >>
> > > > > >> thanks !
> > > > > >>
> > > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > > > >>  ilya.kasnach...@gmail.com >:
> > > > > >> >
> > > > > >> >Hello!
> > > > > >> >
> > > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> > > Cloud)
> > > > > >> into
> > > > > >> >Apache Ignite. I think that's a lot of effort that is not very
> > > > > justified.
> > > > > >> >
> > > > > >> >I don't think we should try to implement sorting in Apache
> Ignite,
> > > > > because
> > > > > >> >it is a lot of work, and a lot of code in our code base which
> we
> > > don't
> > > > > >> >really want.
> > > > > >> >
> > > > > >> >Regards,
> > > > > >> >--
> > > > > >> >Ilya Kasnacheev
> > > > > >> >
> > > > > >> >
> > > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <
> shul...@gmail.com
> > > >:
> > > > > >> >
> > > > > >> >> Dear Igniters,
> > > > > >> >>
> > > > > >> >> The first part of TextQuery improvement - a result limit -
> was
> > > > > developed
> > > > > >> >> and merged.
> > > > > >> >> Now we have to develop most important functionality here -
> proper
> > > > > >> sorting
> > > > > >> >> of Lucene index response and correct reducing of them for
> > > distributed
> > > > > >> >> queries.
> > > > > >> >>
> > > > > >> >> *There are two Lucene based aspects*
> > > > > >> >>
> > > > > >> >> 1. In case of using no sorting fields, the documents in
> response
> > > are
> > > > > >> still
> > > > > >> >> ordered by relevance.
> > > > > >> >> Actually this is ScoreDoc.score value.
> > > > > >> >> In order to reduce the 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-28 Thread Ivan Pavlukhin
Folks, Yuriy,

I suppose that we are going to proceed with

>>>
Reducing on Ignite

The obvious point of distributed response reduction is class
GridCacheDistributedQueryFuture.
Though, @Ivan Pavlukhin mentioned class with similar functionality:
ReduceIndexSorted
What I see here, that it is tangled with H2 related classes
(org.h2.result.Row) and might not be unified with TextQuery reduction.
>>

>From my side there is no strict opinion that we should unify
reduction. Having a separate reduction implementation for text queries
sounds for me as not bad option as well.

Are there still any open questions?

ср, 27 нояб. 2019 г. в 02:27, Denis Magda :
>
> I don't see anything wrong if Yuriy is willing to carry on and keep
> enhancing our full-text search support that lacks basic capabilities.
>
> The basics should be available. If anybody needs an advanced feature they
> can introduce Solr or ElastiSearch into the final architecture of the app.
>
> Folks, who of us can help Yuriy with the questions asked? Most like the SQL
> experts are the best candidates here.
>
>
> -
> Denis
>
>
> On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin  wrote:
>
> > Folks,
> >
> > IEP is an Ignite-specific thing. In fact, I suppose that we are
> > already doing it in ASF way by having this dev-list discussion =)
> >
> > As for me, implementing "limit" feature for text queries is not so big
> > to make an IEP. But we might need to create one for next features.
> >
> > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev :
> > >
> > > Hello!
> > >
> > > ASF way should probably start with an IEP :)
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
> >  > > >:
> > >
> > > >
> > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > > functionality is helpful and PR it, why not ?
> > > >
> > > > isn`t it ?
> > > >
> > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > > ilya.kasnach...@gmail.com>:
> > > > >
> > > > >Hello!
> > > > >
> > > > >The problem here is that Solr is a multi-year effort by a lot of
> > people.
> > > > We
> > > > >can't match that.
> > > > >
> > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > > > cache
> > > > >information into their storage for indexing and relying on their own
> > > > >mechanisms for distributed IR sorting?
> > > > >
> > > > >Regards,
> > > > >--
> > > > >Ilya Kasnacheev
> > > > >
> > > > >
> > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > > arzamas...@mail.ru.invalid
> > > > >>:
> > > > >
> > > > >>
> > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > > > >>
> > > > >> thanks !
> > > > >>
> > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > > >>  ilya.kasnach...@gmail.com >:
> > > > >> >
> > > > >> >Hello!
> > > > >> >
> > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> > Cloud)
> > > > >> into
> > > > >> >Apache Ignite. I think that's a lot of effort that is not very
> > > > justified.
> > > > >> >
> > > > >> >I don't think we should try to implement sorting in Apache Ignite,
> > > > because
> > > > >> >it is a lot of work, and a lot of code in our code base which we
> > don't
> > > > >> >really want.
> > > > >> >
> > > > >> >Regards,
> > > > >> >--
> > > > >> >Ilya Kasnacheev
> > > > >> >
> > > > >> >
> > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shul...@gmail.com
> > >:
> > > > >> >
> > > > >> >> Dear Igniters,
> > > > >> >>
> > > > >> >> The first part of TextQuery improvement - a result limit - was
> > > > developed
> > > > >> >> and merged.
> > > > >> >> Now we have to develop most important functionality here - proper
> > > > >> sorting
> > > > >> >> of Lucene index response and correct reducing of them for
> > distributed
> > > > >> >> queries.
> > > > >> >>
> > > > >> >> *There are two Lucene based aspects*
> > > > >> >>
> > > > >> >> 1. In case of using no sorting fields, the documents in response
> > are
> > > > >> still
> > > > >> >> ordered by relevance.
> > > > >> >> Actually this is ScoreDoc.score value.
> > > > >> >> In order to reduce the distributed results correctly, the score
> > > > should
> > > > >> be
> > > > >> >> passed with response.
> > > > >> >>
> > > > >> >> 2. When sorting by conventional fields, then Lucene should have
> > these
> > > > >> >> fields properly indexed and
> > > > >> >> corresponding Sort object should be applied to Lucene's search
> > call.
> > > > >> >> In order to mark those fields a new annotation like '@SortField'
> > may
> > > > be
> > > > >> >> introduced.
> > > > >> >>
> > > > >> >> *Reducing on Ignite *
> > > > >> >>
> > > > >> >> The obvious point of distributed response reduction is class
> > > > >> >> GridCacheDistributedQueryFuture.
> > > > >> >> Though, @Ivan Pavlukhin mentioned class with similar
> > functionality:
> > > > >> >> ReduceIndexSorted
> > > > >> >> What I see here, that it is tangled with 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Denis Magda
I don't see anything wrong if Yuriy is willing to carry on and keep
enhancing our full-text search support that lacks basic capabilities.

The basics should be available. If anybody needs an advanced feature they
can introduce Solr or ElastiSearch into the final architecture of the app.

Folks, who of us can help Yuriy with the questions asked? Most like the SQL
experts are the best candidates here.


-
Denis


On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin  wrote:

> Folks,
>
> IEP is an Ignite-specific thing. In fact, I suppose that we are
> already doing it in ASF way by having this dev-list discussion =)
>
> As for me, implementing "limit" feature for text queries is not so big
> to make an IEP. But we might need to create one for next features.
>
> вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev :
> >
> > Hello!
> >
> > ASF way should probably start with an IEP :)
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
>  > >:
> >
> > >
> > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > functionality is helpful and PR it, why not ?
> > >
> > > isn`t it ?
> > >
> > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > ilya.kasnach...@gmail.com>:
> > > >
> > > >Hello!
> > > >
> > > >The problem here is that Solr is a multi-year effort by a lot of
> people.
> > > We
> > > >can't match that.
> > > >
> > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > > cache
> > > >information into their storage for indexing and relying on their own
> > > >mechanisms for distributed IR sorting?
> > > >
> > > >Regards,
> > > >--
> > > >Ilya Kasnacheev
> > > >
> > > >
> > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > arzamas...@mail.ru.invalid
> > > >>:
> > > >
> > > >>
> > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > > >>
> > > >> thanks !
> > > >>
> > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > >>  ilya.kasnach...@gmail.com >:
> > > >> >
> > > >> >Hello!
> > > >> >
> > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> Cloud)
> > > >> into
> > > >> >Apache Ignite. I think that's a lot of effort that is not very
> > > justified.
> > > >> >
> > > >> >I don't think we should try to implement sorting in Apache Ignite,
> > > because
> > > >> >it is a lot of work, and a lot of code in our code base which we
> don't
> > > >> >really want.
> > > >> >
> > > >> >Regards,
> > > >> >--
> > > >> >Ilya Kasnacheev
> > > >> >
> > > >> >
> > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shul...@gmail.com
> >:
> > > >> >
> > > >> >> Dear Igniters,
> > > >> >>
> > > >> >> The first part of TextQuery improvement - a result limit - was
> > > developed
> > > >> >> and merged.
> > > >> >> Now we have to develop most important functionality here - proper
> > > >> sorting
> > > >> >> of Lucene index response and correct reducing of them for
> distributed
> > > >> >> queries.
> > > >> >>
> > > >> >> *There are two Lucene based aspects*
> > > >> >>
> > > >> >> 1. In case of using no sorting fields, the documents in response
> are
> > > >> still
> > > >> >> ordered by relevance.
> > > >> >> Actually this is ScoreDoc.score value.
> > > >> >> In order to reduce the distributed results correctly, the score
> > > should
> > > >> be
> > > >> >> passed with response.
> > > >> >>
> > > >> >> 2. When sorting by conventional fields, then Lucene should have
> these
> > > >> >> fields properly indexed and
> > > >> >> corresponding Sort object should be applied to Lucene's search
> call.
> > > >> >> In order to mark those fields a new annotation like '@SortField'
> may
> > > be
> > > >> >> introduced.
> > > >> >>
> > > >> >> *Reducing on Ignite *
> > > >> >>
> > > >> >> The obvious point of distributed response reduction is class
> > > >> >> GridCacheDistributedQueryFuture.
> > > >> >> Though, @Ivan Pavlukhin mentioned class with similar
> functionality:
> > > >> >> ReduceIndexSorted
> > > >> >> What I see here, that it is tangled with H2 related classes (
> > > >> >> org.h2.result.Row) and might not be unified with TextQuery
> reduction.
> > > >> >>
> > > >> >> Still need a support here.
> > > >> >>
> > > >> >> Overall, the goal of this letter is to initiate discussion on
> > > TextQuery
> > > >> >> Sorting implementation and come closer to ticket creation.
> > > >> >>
> > > >> >> BR,
> > > >> >> Yuriy Shuliha
> > > >> >>
> > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > > andrey.mashen...@gmail.com
> > > >> >
> > > >> >> пише:
> > > >> >>
> > > >> >> > Hi Dmitry, Yuriy.
> > > >> >> >
> > > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > > AtomicInteger
> > > >> >> > 'total' field and 'limit; field as primitive int.
> > > >> >> >
> > > >> >> > Both fields are used inside synchronized block only.
> > > >> >> > So, we can make both private and downgrade AtomicInteger to
> > > primitive
> > > >> >> int.
> > > >> >> >
> 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Ivan Pavlukhin
Folks,

IEP is an Ignite-specific thing. In fact, I suppose that we are
already doing it in ASF way by having this dev-list discussion =)

As for me, implementing "limit" feature for text queries is not so big
to make an IEP. But we might need to create one for next features.

вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev :
>
> Hello!
>
> ASF way should probably start with an IEP :)
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky  >:
>
> >
> > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > functionality is helpful and PR it, why not ?
> >
> > isn`t it ?
> >
> > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > ilya.kasnach...@gmail.com>:
> > >
> > >Hello!
> > >
> > >The problem here is that Solr is a multi-year effort by a lot of people.
> > We
> > >can't match that.
> > >
> > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > cache
> > >information into their storage for indexing and relying on their own
> > >mechanisms for distributed IR sorting?
> > >
> > >Regards,
> > >--
> > >Ilya Kasnacheev
> > >
> > >
> > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > arzamas...@mail.ru.invalid
> > >>:
> > >
> > >>
> > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > >>
> > >> thanks !
> > >>
> > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > >>  ilya.kasnach...@gmail.com >:
> > >> >
> > >> >Hello!
> > >> >
> > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> > >> into
> > >> >Apache Ignite. I think that's a lot of effort that is not very
> > justified.
> > >> >
> > >> >I don't think we should try to implement sorting in Apache Ignite,
> > because
> > >> >it is a lot of work, and a lot of code in our code base which we don't
> > >> >really want.
> > >> >
> > >> >Regards,
> > >> >--
> > >> >Ilya Kasnacheev
> > >> >
> > >> >
> > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shul...@gmail.com >:
> > >> >
> > >> >> Dear Igniters,
> > >> >>
> > >> >> The first part of TextQuery improvement - a result limit - was
> > developed
> > >> >> and merged.
> > >> >> Now we have to develop most important functionality here - proper
> > >> sorting
> > >> >> of Lucene index response and correct reducing of them for distributed
> > >> >> queries.
> > >> >>
> > >> >> *There are two Lucene based aspects*
> > >> >>
> > >> >> 1. In case of using no sorting fields, the documents in response are
> > >> still
> > >> >> ordered by relevance.
> > >> >> Actually this is ScoreDoc.score value.
> > >> >> In order to reduce the distributed results correctly, the score
> > should
> > >> be
> > >> >> passed with response.
> > >> >>
> > >> >> 2. When sorting by conventional fields, then Lucene should have these
> > >> >> fields properly indexed and
> > >> >> corresponding Sort object should be applied to Lucene's search call.
> > >> >> In order to mark those fields a new annotation like '@SortField' may
> > be
> > >> >> introduced.
> > >> >>
> > >> >> *Reducing on Ignite *
> > >> >>
> > >> >> The obvious point of distributed response reduction is class
> > >> >> GridCacheDistributedQueryFuture.
> > >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> > >> >> ReduceIndexSorted
> > >> >> What I see here, that it is tangled with H2 related classes (
> > >> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> > >> >>
> > >> >> Still need a support here.
> > >> >>
> > >> >> Overall, the goal of this letter is to initiate discussion on
> > TextQuery
> > >> >> Sorting implementation and come closer to ticket creation.
> > >> >>
> > >> >> BR,
> > >> >> Yuriy Shuliha
> > >> >>
> > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > andrey.mashen...@gmail.com
> > >> >
> > >> >> пише:
> > >> >>
> > >> >> > Hi Dmitry, Yuriy.
> > >> >> >
> > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > AtomicInteger
> > >> >> > 'total' field and 'limit; field as primitive int.
> > >> >> >
> > >> >> > Both fields are used inside synchronized block only.
> > >> >> > So, we can make both private and downgrade AtomicInteger to
> > primitive
> > >> >> int.
> > >> >> >
> > >> >> > Most likely, these fields can be replaced with one field.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> > dpav...@apache.org
> > >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hi Andrey,
> > >> >> > >
> > >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> > (with
> > >> no
> > >> >> > > blockers).
> > >> >> > >
> > >> >> > > Do you have any concerns related to this patch?
> > >> >> > >
> > >> >> > > Sincerely,
> > >> >> > > Dmitriy Pavlov
> > >> >> > >
> > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shul...@gmail.com
> > >:
> > >> >> > >
> > >> >> > >> Andrey,
> > >> >> > >>
> > >> >> > >> Per you request, I created ticket
> > >> >> > >>  

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Ilya Kasnacheev
Hello!

ASF way should probably start with an IEP :)

Regards,
-- 
Ilya Kasnacheev


вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky :

>
> Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> functionality is helpful and PR it, why not ?
>
> isn`t it ?
>
> >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>:
> >
> >Hello!
> >
> >The problem here is that Solr is a multi-year effort by a lot of people.
> We
> >can't match that.
> >
> >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> cache
> >information into their storage for indexing and relying on their own
> >mechanisms for distributed IR sorting?
> >
> >Regards,
> >--
> >Ilya Kasnacheev
> >
> >
> >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> arzamas...@mail.ru.invalid
> >>:
> >
> >>
> >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> >>
> >> thanks !
> >>
> >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> >>  ilya.kasnach...@gmail.com >:
> >> >
> >> >Hello!
> >> >
> >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> >> into
> >> >Apache Ignite. I think that's a lot of effort that is not very
> justified.
> >> >
> >> >I don't think we should try to implement sorting in Apache Ignite,
> because
> >> >it is a lot of work, and a lot of code in our code base which we don't
> >> >really want.
> >> >
> >> >Regards,
> >> >--
> >> >Ilya Kasnacheev
> >> >
> >> >
> >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shul...@gmail.com >:
> >> >
> >> >> Dear Igniters,
> >> >>
> >> >> The first part of TextQuery improvement - a result limit - was
> developed
> >> >> and merged.
> >> >> Now we have to develop most important functionality here - proper
> >> sorting
> >> >> of Lucene index response and correct reducing of them for distributed
> >> >> queries.
> >> >>
> >> >> *There are two Lucene based aspects*
> >> >>
> >> >> 1. In case of using no sorting fields, the documents in response are
> >> still
> >> >> ordered by relevance.
> >> >> Actually this is ScoreDoc.score value.
> >> >> In order to reduce the distributed results correctly, the score
> should
> >> be
> >> >> passed with response.
> >> >>
> >> >> 2. When sorting by conventional fields, then Lucene should have these
> >> >> fields properly indexed and
> >> >> corresponding Sort object should be applied to Lucene's search call.
> >> >> In order to mark those fields a new annotation like '@SortField' may
> be
> >> >> introduced.
> >> >>
> >> >> *Reducing on Ignite *
> >> >>
> >> >> The obvious point of distributed response reduction is class
> >> >> GridCacheDistributedQueryFuture.
> >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> >> >> ReduceIndexSorted
> >> >> What I see here, that it is tangled with H2 related classes (
> >> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> >> >>
> >> >> Still need a support here.
> >> >>
> >> >> Overall, the goal of this letter is to initiate discussion on
> TextQuery
> >> >> Sorting implementation and come closer to ticket creation.
> >> >>
> >> >> BR,
> >> >> Yuriy Shuliha
> >> >>
> >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> andrey.mashen...@gmail.com
> >> >
> >> >> пише:
> >> >>
> >> >> > Hi Dmitry, Yuriy.
> >> >> >
> >> >> > I've found GridCacheQueryFutureAdapter has newly added
> AtomicInteger
> >> >> > 'total' field and 'limit; field as primitive int.
> >> >> >
> >> >> > Both fields are used inside synchronized block only.
> >> >> > So, we can make both private and downgrade AtomicInteger to
> primitive
> >> >> int.
> >> >> >
> >> >> > Most likely, these fields can be replaced with one field.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> dpav...@apache.org
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Andrey,
> >> >> > >
> >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> (with
> >> no
> >> >> > > blockers).
> >> >> > >
> >> >> > > Do you have any concerns related to this patch?
> >> >> > >
> >> >> > > Sincerely,
> >> >> > > Dmitriy Pavlov
> >> >> > >
> >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shul...@gmail.com
> >:
> >> >> > >
> >> >> > >> Andrey,
> >> >> > >>
> >> >> > >> Per you request, I created ticket
> >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
> >> >> > >>
> >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >> >> > >>
> >> >> > >> Could you please proceed with PR merge ?
> >> >> > >>
> >> >> > >> BR,
> >> >> > >> Yuriy Shuliha
> >> >> > >>
> >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> >>  andrey.mashen...@gmail.com
> >> >> >
> >> >> > >> пише:
> >> >> > >>
> >> >> > >> > Hi Yuri,
> >> >> > >> >
> >> >> > >> > To get access to TC Bot you should register as TeamCity user
> >> [1], if
> >> >> > you
> >> >> > >> > didn't do this already.
> >> >> > >> > Then you will be able to authorize on Ignite TC 

Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Zhenya Stanilovsky

Ok, lets forgot Solr and go through ASF way, if Yuriy prove this functionality 
is helpful and PR it, why not ?
 
isn`t it ?
  
>Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev 
>:
> 
>Hello!
>
>The problem here is that Solr is a multi-year effort by a lot of people. We
>can't match that.
>
>Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache
>information into their storage for indexing and relying on their own
>mechanisms for distributed IR sorting?
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < arzamas...@mail.ru.invalid
>>:
>
>>
>> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
>>
>> thanks !
>>
>> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
>>  ilya.kasnach...@gmail.com >:
>> >
>> >Hello!
>> >
>> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
>> into
>> >Apache Ignite. I think that's a lot of effort that is not very justified.
>> >
>> >I don't think we should try to implement sorting in Apache Ignite, because
>> >it is a lot of work, and a lot of code in our code base which we don't
>> >really want.
>> >
>> >Regards,
>> >--
>> >Ilya Kasnacheev
>> >
>> >
>> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shul...@gmail.com >:
>> >
>> >> Dear Igniters,
>> >>
>> >> The first part of TextQuery improvement - a result limit - was developed
>> >> and merged.
>> >> Now we have to develop most important functionality here - proper
>> sorting
>> >> of Lucene index response and correct reducing of them for distributed
>> >> queries.
>> >>
>> >> *There are two Lucene based aspects*
>> >>
>> >> 1. In case of using no sorting fields, the documents in response are
>> still
>> >> ordered by relevance.
>> >> Actually this is ScoreDoc.score value.
>> >> In order to reduce the distributed results correctly, the score should
>> be
>> >> passed with response.
>> >>
>> >> 2. When sorting by conventional fields, then Lucene should have these
>> >> fields properly indexed and
>> >> corresponding Sort object should be applied to Lucene's search call.
>> >> In order to mark those fields a new annotation like '@SortField' may be
>> >> introduced.
>> >>
>> >> *Reducing on Ignite *
>> >>
>> >> The obvious point of distributed response reduction is class
>> >> GridCacheDistributedQueryFuture.
>> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
>> >> ReduceIndexSorted
>> >> What I see here, that it is tangled with H2 related classes (
>> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
>> >>
>> >> Still need a support here.
>> >>
>> >> Overall, the goal of this letter is to initiate discussion on TextQuery
>> >> Sorting implementation and come closer to ticket creation.
>> >>
>> >> BR,
>> >> Yuriy Shuliha
>> >>
>> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <  andrey.mashen...@gmail.com
>> >
>> >> пише:
>> >>
>> >> > Hi Dmitry, Yuriy.
>> >> >
>> >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
>> >> > 'total' field and 'limit; field as primitive int.
>> >> >
>> >> > Both fields are used inside synchronized block only.
>> >> > So, we can make both private and downgrade AtomicInteger to primitive
>> >> int.
>> >> >
>> >> > Most likely, these fields can be replaced with one field.
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <  dpav...@apache.org
>> >
>> >> > wrote:
>> >> >
>> >> > > Hi Andrey,
>> >> > >
>> >> > > I've checked this ticket comments, and there is a TC Bot visa (with
>> no
>> >> > > blockers).
>> >> > >
>> >> > > Do you have any concerns related to this patch?
>> >> > >
>> >> > > Sincerely,
>> >> > > Dmitriy Pavlov
>> >> > >
>> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shul...@gmail.com >:
>> >> > >
>> >> > >> Andrey,
>> >> > >>
>> >> > >> Per you request, I created ticket
>> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
>> >> > >>
>>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>> >> > >>
>> >> > >> Could you please proceed with PR merge ?
>> >> > >>
>> >> > >> BR,
>> >> > >> Yuriy Shuliha
>> >> > >>
>> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
>>  andrey.mashen...@gmail.com
>> >> >
>> >> > >> пише:
>> >> > >>
>> >> > >> > Hi Yuri,
>> >> > >> >
>> >> > >> > To get access to TC Bot you should register as TeamCity user
>> [1], if
>> >> > you
>> >> > >> > didn't do this already.
>> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
>> same
>> >> > >> > credentials.
>> >> > >> >
>> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
>> >> > >> >
>> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <  shul...@gmail.com
>> >
>> >> > wrote:
>> >> > >> >
>> >> > >> >> Andrew,
>> >> > >> >>
>> >> > >> >> I have corrected PR according to your notes. Please review.
>> >> > >> >> What will be the next steps in order to merge in?
>> >> > >> >>
>> >> > >> >> Y.
>> >> > >> >>
>> >> > >> >> 

Re: Re[2]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Ilya Kasnacheev
Hello!

The problem here is that Solr is a multi-year effort by a lot of people. We
can't match that.

Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache
information into their storage for indexing and relying on their own
mechanisms for distributed IR sorting?

Regards,
-- 
Ilya Kasnacheev


вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky :

>
> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
>
> thanks !
>
> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>:
> >
> >Hello!
> >
> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> into
> >Apache Ignite. I think that's a lot of effort that is not very justified.
> >
> >I don't think we should try to implement sorting in Apache Ignite, because
> >it is a lot of work, and a lot of code in our code base which we don't
> >really want.
> >
> >Regards,
> >--
> >Ilya Kasnacheev
> >
> >
> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < shul...@gmail.com >:
> >
> >> Dear Igniters,
> >>
> >> The first part of TextQuery improvement - a result limit - was developed
> >> and merged.
> >> Now we have to develop most important functionality here - proper
> sorting
> >> of Lucene index response and correct reducing of them for distributed
> >> queries.
> >>
> >> *There are two Lucene based aspects*
> >>
> >> 1. In case of using no sorting fields, the documents in response are
> still
> >> ordered by relevance.
> >> Actually this is ScoreDoc.score value.
> >> In order to reduce the distributed results correctly, the score should
> be
> >> passed with response.
> >>
> >> 2. When sorting by conventional fields, then Lucene should have these
> >> fields properly indexed and
> >> corresponding Sort object should be applied to Lucene's search call.
> >> In order to mark those fields a new annotation like '@SortField' may be
> >> introduced.
> >>
> >> *Reducing on Ignite *
> >>
> >> The obvious point of distributed response reduction is class
> >> GridCacheDistributedQueryFuture.
> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> >> ReduceIndexSorted
> >> What I see here, that it is tangled with H2 related classes (
> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> >>
> >> Still need a support here.
> >>
> >> Overall, the goal of this letter is to initiate discussion on TextQuery
> >> Sorting implementation and come closer to ticket creation.
> >>
> >> BR,
> >> Yuriy Shuliha
> >>
> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < andrey.mashen...@gmail.com
> >
> >> пише:
> >>
> >> > Hi Dmitry, Yuriy.
> >> >
> >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
> >> > 'total' field and 'limit; field as primitive int.
> >> >
> >> > Both fields are used inside synchronized block only.
> >> > So, we can make both private and downgrade AtomicInteger to primitive
> >> int.
> >> >
> >> > Most likely, these fields can be replaced with one field.
> >> >
> >> >
> >> >
> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < dpav...@apache.org
> >
> >> > wrote:
> >> >
> >> > > Hi Andrey,
> >> > >
> >> > > I've checked this ticket comments, and there is a TC Bot visa (with
> no
> >> > > blockers).
> >> > >
> >> > > Do you have any concerns related to this patch?
> >> > >
> >> > > Sincerely,
> >> > > Dmitriy Pavlov
> >> > >
> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < shul...@gmail.com >:
> >> > >
> >> > >> Andrey,
> >> > >>
> >> > >> Per you request, I created ticket
> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
> >> > >>
> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >> > >>
> >> > >> Could you please proceed with PR merge ?
> >> > >>
> >> > >> BR,
> >> > >> Yuriy Shuliha
> >> > >>
> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> andrey.mashen...@gmail.com
> >> >
> >> > >> пише:
> >> > >>
> >> > >> > Hi Yuri,
> >> > >> >
> >> > >> > To get access to TC Bot you should register as TeamCity user
> [1], if
> >> > you
> >> > >> > didn't do this already.
> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
> same
> >> > >> > credentials.
> >> > >> >
> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> >> > >> >
> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < shul...@gmail.com
> >
> >> > wrote:
> >> > >> >
> >> > >> >> Andrew,
> >> > >> >>
> >> > >> >> I have corrected PR according to your notes. Please review.
> >> > >> >> What will be the next steps in order to merge in?
> >> > >> >>
> >> > >> >> Y.
> >> > >> >>
> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> >> >  andrey.mashen...@gmail.com >
> >> > >> >> пише:
> >> > >> >>
> >> > >> >> > Yuri,
> >> > >> >> >
> >> > >> >> > I've done with review.
> >> > >> >> > No crime found, but trivial compatibility bug.
> >> > >> >> >
> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> shul...@gmail.com >
> >> > >> wrote:
> >> > >> >> >
> >> > >> >> 

Re[2]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Zhenya Stanilovsky

Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
 
thanks !
  
>Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev 
>:
> 
>Hello!
>
>I have a hunch that we are trying to build Apache Solr (or Solr Cloud) into
>Apache Ignite. I think that's a lot of effort that is not very justified.
>
>I don't think we should try to implement sorting in Apache Ignite, because
>it is a lot of work, and a lot of code in our code base which we don't
>really want.
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < shul...@gmail.com >:
> 
>> Dear Igniters,
>>
>> The first part of TextQuery improvement - a result limit - was developed
>> and merged.
>> Now we have to develop most important functionality here - proper sorting
>> of Lucene index response and correct reducing of them for distributed
>> queries.
>>
>> *There are two Lucene based aspects*
>>
>> 1. In case of using no sorting fields, the documents in response are still
>> ordered by relevance.
>> Actually this is ScoreDoc.score value.
>> In order to reduce the distributed results correctly, the score should be
>> passed with response.
>>
>> 2. When sorting by conventional fields, then Lucene should have these
>> fields properly indexed and
>> corresponding Sort object should be applied to Lucene's search call.
>> In order to mark those fields a new annotation like '@SortField' may be
>> introduced.
>>
>> *Reducing on Ignite *
>>
>> The obvious point of distributed response reduction is class
>> GridCacheDistributedQueryFuture.
>> Though, @Ivan Pavlukhin mentioned class with similar functionality:
>> ReduceIndexSorted
>> What I see here, that it is tangled with H2 related classes (
>> org.h2.result.Row) and might not be unified with TextQuery reduction.
>>
>> Still need a support here.
>>
>> Overall, the goal of this letter is to initiate discussion on TextQuery
>> Sorting implementation and come closer to ticket creation.
>>
>> BR,
>> Yuriy Shuliha
>>
>> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < andrey.mashen...@gmail.com >
>> пише:
>>
>> > Hi Dmitry, Yuriy.
>> >
>> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
>> > 'total' field and 'limit; field as primitive int.
>> >
>> > Both fields are used inside synchronized block only.
>> > So, we can make both private and downgrade AtomicInteger to primitive
>> int.
>> >
>> > Most likely, these fields can be replaced with one field.
>> >
>> >
>> >
>> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < dpav...@apache.org >
>> > wrote:
>> >
>> > > Hi Andrey,
>> > >
>> > > I've checked this ticket comments, and there is a TC Bot visa (with no
>> > > blockers).
>> > >
>> > > Do you have any concerns related to this patch?
>> > >
>> > > Sincerely,
>> > > Dmitriy Pavlov
>> > >
>> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < shul...@gmail.com >:
>> > >
>> > >> Andrey,
>> > >>
>> > >> Per you request, I created ticket
>> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
>> > >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>> > >>
>> > >> Could you please proceed with PR merge ?
>> > >>
>> > >> BR,
>> > >> Yuriy Shuliha
>> > >>
>> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < andrey.mashen...@gmail.com
>> >
>> > >> пише:
>> > >>
>> > >> > Hi Yuri,
>> > >> >
>> > >> > To get access to TC Bot you should register as TeamCity user [1], if
>> > you
>> > >> > didn't do this already.
>> > >> > Then you will be able to authorize on Ignite TC Bot page with same
>> > >> > credentials.
>> > >> >
>> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
>> > >> >
>> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < shul...@gmail.com >
>> > wrote:
>> > >> >
>> > >> >> Andrew,
>> > >> >>
>> > >> >> I have corrected PR according to your notes. Please review.
>> > >> >> What will be the next steps in order to merge in?
>> > >> >>
>> > >> >> Y.
>> > >> >>
>> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
>> >  andrey.mashen...@gmail.com >
>> > >> >> пише:
>> > >> >>
>> > >> >> > Yuri,
>> > >> >> >
>> > >> >> > I've done with review.
>> > >> >> > No crime found, but trivial compatibility bug.
>> > >> >> >
>> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < shul...@gmail.com >
>> > >> wrote:
>> > >> >> >
>> > >> >> > > Denis,
>> > >> >> > >
>> > >> >> > > Thank you for your attention to this.
>> > >> >> > > as for now, the
>> >  https://issues.apache.org/jira/browse/IGNITE-12189
>> > >> >> > ticket
>> > >> >> > > is still pending review.
>> > >> >> > > Do we have a chance to move it forward somehow?
>> > >> >> > >
>> > >> >> > > BR,
>> > >> >> > > Yuriy Shuliha
>> > >> >> > >
>> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < dma...@apache.org > пише:
>> > >> >> > >
>> > >> >> > > > Yuriy,
>> > >> >> > > >
>> > >> >> > > > I've seen you opening a pull-request with the first changes:
>> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
>> > >> >> > > >
>> > >> >> > > > Alex 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-26 Thread Ilya Kasnacheev
Hello!

I have a hunch that we are trying to build Apache Solr (or Solr Cloud) into
Apache Ignite. I think that's a lot of effort that is not very justified.

I don't think we should try to implement sorting in Apache Ignite, because
it is a lot of work, and a lot of code in our code base which we don't
really want.

Regards,
-- 
Ilya Kasnacheev


пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga :

> Dear Igniters,
>
> The first part of TextQuery improvement - a result limit - was developed
> and merged.
> Now we have to develop most important functionality here - proper sorting
> of Lucene index response and correct reducing of them for distributed
> queries.
>
> *There are two Lucene based aspects*
>
> 1. In case of using no sorting fields, the documents in response are still
> ordered by relevance.
> Actually this is ScoreDoc.score value.
> In order to reduce the distributed results correctly, the score should be
> passed with response.
>
> 2. When sorting by conventional fields, then Lucene should have these
> fields properly indexed and
> corresponding  Sort object should be applied to Lucene's search call.
> In order to mark those fields a new annotation like '@SortField' may be
> introduced.
>
> *Reducing on Ignite *
>
> The obvious point of distributed response reduction is class
> GridCacheDistributedQueryFuture.
> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> ReduceIndexSorted
> What I see here, that it is tangled with H2 related classes (
> org.h2.result.Row) and might not be unified with TextQuery reduction.
>
> Still need a support here.
>
> Overall, the goal of this letter is to initiate discussion on TextQuery
> Sorting implementation and come closer to ticket creation.
>
> BR,
> Yuriy Shuliha
>
> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov 
> пише:
>
> > Hi Dmitry, Yuriy.
> >
> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
> > 'total' field and 'limit; field as primitive int.
> >
> > Both fields are used inside synchronized block only.
> > So, we can make both private and downgrade AtomicInteger to primitive
> int.
> >
> > Most likely, these fields can be replaced with one field.
> >
> >
> >
> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov 
> > wrote:
> >
> > > Hi Andrey,
> > >
> > > I've checked this ticket comments, and there is a TC Bot visa (with no
> > > blockers).
> > >
> > > Do you have any concerns related to this patch?
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga :
> > >
> > >>   Andrey,
> > >>
> > >> Per you request, I created ticket
> > >> https://issues.apache.org/jira/browse/IGNITE-12291   linked to
> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> > >>
> > >> Could you please proceed with PR merge ?
> > >>
> > >> BR,
> > >> Yuriy Shuliha
> > >>
> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov  >
> > >> пише:
> > >>
> > >> > Hi Yuri,
> > >> >
> > >> > To get access to TC Bot you should register as TeamCity user [1], if
> > you
> > >> > didn't do this already.
> > >> > Then you will be able to authorize on Ignite TC Bot page with same
> > >> > credentials.
> > >> >
> > >> > [1] https://ci.ignite.apache.org/registerUser.html
> > >> >
> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga 
> > wrote:
> > >> >
> > >> >> Andrew,
> > >> >>
> > >> >> I have corrected PR according to your notes. Please review.
> > >> >> What will be the next steps in order to merge in?
> > >> >>
> > >> >> Y.
> > >> >>
> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> > andrey.mashen...@gmail.com>
> > >> >> пише:
> > >> >>
> > >> >> > Yuri,
> > >> >> >
> > >> >> > I've done with review.
> > >> >> > No crime found, but trivial compatibility bug.
> > >> >> >
> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga 
> > >> wrote:
> > >> >> >
> > >> >> > > Denis,
> > >> >> > >
> > >> >> > > Thank you for your attention to this.
> > >> >> > > as for now, the
> > https://issues.apache.org/jira/browse/IGNITE-12189
> > >> >> > ticket
> > >> >> > > is still pending review.
> > >> >> > > Do we have a chance to move it forward somehow?
> > >> >> > >
> > >> >> > > BR,
> > >> >> > > Yuriy Shuliha
> > >> >> > >
> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> > >> >> > >
> > >> >> > > > Yuriy,
> > >> >> > > >
> > >> >> > > > I've seen you opening a pull-request with the first changes:
> > >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> > >> >> > > >
> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
> > review?
> > >> >> > > >
> > >> >> > > > -
> > >> >> > > > Denis
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> > >> vololo...@gmail.com>
> > >> >> > > wrote:
> > >> >> > > >
> > >> >> > > > > Yuriy,
> > >> >> > > > >
> > >> >> > > > > Thank you for providing details! Quite interesting.
> > >> >> > > > >
> > >> >> > > > > Yes, we already have support of distributed limit and
> merging
> > 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-22 Thread Yuriy Shuliga
Dear Igniters,

The first part of TextQuery improvement - a result limit - was developed
and merged.
Now we have to develop most important functionality here - proper sorting
of Lucene index response and correct reducing of them for distributed
queries.

*There are two Lucene based aspects*

1. In case of using no sorting fields, the documents in response are still
ordered by relevance.
Actually this is ScoreDoc.score value.
In order to reduce the distributed results correctly, the score should be
passed with response.

2. When sorting by conventional fields, then Lucene should have these
fields properly indexed and
corresponding  Sort object should be applied to Lucene's search call.
In order to mark those fields a new annotation like '@SortField' may be
introduced.

*Reducing on Ignite *

The obvious point of distributed response reduction is class
GridCacheDistributedQueryFuture.
Though, @Ivan Pavlukhin mentioned class with similar functionality:
ReduceIndexSorted
What I see here, that it is tangled with H2 related classes (
org.h2.result.Row) and might not be unified with TextQuery reduction.

Still need a support here.

Overall, the goal of this letter is to initiate discussion on TextQuery
Sorting implementation and come closer to ticket creation.

BR,
Yuriy Shuliha

вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov 
пише:

> Hi Dmitry, Yuriy.
>
> I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
> 'total' field and 'limit; field as primitive int.
>
> Both fields are used inside synchronized block only.
> So, we can make both private and downgrade AtomicInteger to primitive int.
>
> Most likely, these fields can be replaced with one field.
>
>
>
> On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov 
> wrote:
>
> > Hi Andrey,
> >
> > I've checked this ticket comments, and there is a TC Bot visa (with no
> > blockers).
> >
> > Do you have any concerns related to this patch?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga :
> >
> >>   Andrey,
> >>
> >> Per you request, I created ticket
> >> https://issues.apache.org/jira/browse/IGNITE-12291   linked to
> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >>
> >> Could you please proceed with PR merge ?
> >>
> >> BR,
> >> Yuriy Shuliha
> >>
> >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov 
> >> пише:
> >>
> >> > Hi Yuri,
> >> >
> >> > To get access to TC Bot you should register as TeamCity user [1], if
> you
> >> > didn't do this already.
> >> > Then you will be able to authorize on Ignite TC Bot page with same
> >> > credentials.
> >> >
> >> > [1] https://ci.ignite.apache.org/registerUser.html
> >> >
> >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga 
> wrote:
> >> >
> >> >> Andrew,
> >> >>
> >> >> I have corrected PR according to your notes. Please review.
> >> >> What will be the next steps in order to merge in?
> >> >>
> >> >> Y.
> >> >>
> >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> andrey.mashen...@gmail.com>
> >> >> пише:
> >> >>
> >> >> > Yuri,
> >> >> >
> >> >> > I've done with review.
> >> >> > No crime found, but trivial compatibility bug.
> >> >> >
> >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga 
> >> wrote:
> >> >> >
> >> >> > > Denis,
> >> >> > >
> >> >> > > Thank you for your attention to this.
> >> >> > > as for now, the
> https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > ticket
> >> >> > > is still pending review.
> >> >> > > Do we have a chance to move it forward somehow?
> >> >> > >
> >> >> > > BR,
> >> >> > > Yuriy Shuliha
> >> >> > >
> >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> >> >> > >
> >> >> > > > Yuriy,
> >> >> > > >
> >> >> > > > I've seen you opening a pull-request with the first changes:
> >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > > >
> >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
> review?
> >> >> > > >
> >> >> > > > -
> >> >> > > > Denis
> >> >> > > >
> >> >> > > >
> >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> >> vololo...@gmail.com>
> >> >> > > wrote:
> >> >> > > >
> >> >> > > > > Yuriy,
> >> >> > > > >
> >> >> > > > > Thank you for providing details! Quite interesting.
> >> >> > > > >
> >> >> > > > > Yes, we already have support of distributed limit and merging
> >> >> sorted
> >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> >> >> > > > > MergeStreamIterator are used for merging sorted streams.
> >> >> > > > >
> >> >> > > > > Could you please also clarify about score/relevance? Is it
> >> >> provided
> >> >> > by
> >> >> > > > > Lucene engine for each query result? I am thinking how to do
> >> >> sorted
> >> >> > > > > merge properly in this case.
> >> >> > > > >
> >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> shul...@gmail.com
> >> >:
> >> >> > > > > >
> >> >> > > > > > Ivan,
> >> >> > > > > >
> >> >> > > > > > Thank you for interesting question!
> >> >> > > > > >
> >> >> > > > > > Text searches (or full 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-22 Thread Andrey Mashenkov
Hi Dmitry, Yuriy.

I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
'total' field and 'limit; field as primitive int.

Both fields are used inside synchronized block only.
So, we can make both private and downgrade AtomicInteger to primitive int.

Most likely, these fields can be replaced with one field.



On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov  wrote:

> Hi Andrey,
>
> I've checked this ticket comments, and there is a TC Bot visa (with no
> blockers).
>
> Do you have any concerns related to this patch?
>
> Sincerely,
> Dmitriy Pavlov
>
> чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga :
>
>>   Andrey,
>>
>> Per you request, I created ticket
>> https://issues.apache.org/jira/browse/IGNITE-12291   linked to
>> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>>
>> Could you please proceed with PR merge ?
>>
>> BR,
>> Yuriy Shuliha
>>
>> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov 
>> пише:
>>
>> > Hi Yuri,
>> >
>> > To get access to TC Bot you should register as TeamCity user [1], if you
>> > didn't do this already.
>> > Then you will be able to authorize on Ignite TC Bot page with same
>> > credentials.
>> >
>> > [1] https://ci.ignite.apache.org/registerUser.html
>> >
>> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga  wrote:
>> >
>> >> Andrew,
>> >>
>> >> I have corrected PR according to your notes. Please review.
>> >> What will be the next steps in order to merge in?
>> >>
>> >> Y.
>> >>
>> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov 
>> >> пише:
>> >>
>> >> > Yuri,
>> >> >
>> >> > I've done with review.
>> >> > No crime found, but trivial compatibility bug.
>> >> >
>> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga 
>> wrote:
>> >> >
>> >> > > Denis,
>> >> > >
>> >> > > Thank you for your attention to this.
>> >> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
>> >> > ticket
>> >> > > is still pending review.
>> >> > > Do we have a chance to move it forward somehow?
>> >> > >
>> >> > > BR,
>> >> > > Yuriy Shuliha
>> >> > >
>> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
>> >> > >
>> >> > > > Yuriy,
>> >> > > >
>> >> > > > I've seen you opening a pull-request with the first changes:
>> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189
>> >> > > >
>> >> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
>> >> > > >
>> >> > > > -
>> >> > > > Denis
>> >> > > >
>> >> > > >
>> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
>> vololo...@gmail.com>
>> >> > > wrote:
>> >> > > >
>> >> > > > > Yuriy,
>> >> > > > >
>> >> > > > > Thank you for providing details! Quite interesting.
>> >> > > > >
>> >> > > > > Yes, we already have support of distributed limit and merging
>> >> sorted
>> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
>> >> > > > > MergeStreamIterator are used for merging sorted streams.
>> >> > > > >
>> >> > > > > Could you please also clarify about score/relevance? Is it
>> >> provided
>> >> > by
>> >> > > > > Lucene engine for each query result? I am thinking how to do
>> >> sorted
>> >> > > > > merge properly in this case.
>> >> > > > >
>> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga > >:
>> >> > > > > >
>> >> > > > > > Ivan,
>> >> > > > > >
>> >> > > > > > Thank you for interesting question!
>> >> > > > > >
>> >> > > > > > Text searches (or full text searches) are mostly
>> human-oriented.
>> >> > And
>> >> > > > the
>> >> > > > > > point of user's interest is topmost part of response.
>> >> > > > > > Then user can read it, evaluate and use the given records for
>> >> > further
>> >> > > > > > purposes.
>> >> > > > > >
>> >> > > > > > Particularly in our case, we use Ignite for operations with
>> >> > financial
>> >> > > > > data,
>> >> > > > > > and there lots of text stuff like assets names, fin.
>> >> instruments,
>> >> > > > > companies
>> >> > > > > > etc.
>> >> > > > > > In order to operate with this quickly and reliably, users
>> used
>> >> to
>> >> > > work
>> >> > > > > with
>> >> > > > > > text search, type-ahead completions, suggestions.
>> >> > > > > >
>> >> > > > > > For this purposes we are indexing particular string data in
>> >> > separate
>> >> > > > > caches.
>> >> > > > > >
>> >> > > > > > Sorting capabilities and response size limitations are very
>> >> > important
>> >> > > > > > there. As our API have to provide most relevant information
>> in
>> >> view
>> >> > > of
>> >> > > > > > limited size.
>> >> > > > > >
>> >> > > > > > Now let me comment some Ignite/Lucene perspective.
>> >> > > > > > Actually Ignite queries and Lucene returns
>> *TopDocs.scoresDocs
>> >> > > *already
>> >> > > > > > sorted by *score *(relevance). So most relevant documents
>> are on
>> >> > the
>> >> > > > top.
>> >> > > > > > And currently distributed queries responses from different
>> nodes
>> >> > are
>> >> > > > > merged
>> >> > > > > > into final query cursor queue in arbitrary way.
>> >> > > > > > So in fact we already have the score order ruined here. 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-21 Thread Dmitriy Pavlov
Hi Andrey,

I've checked this ticket comments, and there is a TC Bot visa (with no
blockers).

Do you have any concerns related to this patch?

Sincerely,
Dmitriy Pavlov

чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga :

>   Andrey,
>
> Per you request, I created ticket
> https://issues.apache.org/jira/browse/IGNITE-12291   linked to
> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>
> Could you please proceed with PR merge ?
>
> BR,
> Yuriy Shuliha
>
> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov 
> пише:
>
> > Hi Yuri,
> >
> > To get access to TC Bot you should register as TeamCity user [1], if you
> > didn't do this already.
> > Then you will be able to authorize on Ignite TC Bot page with same
> > credentials.
> >
> > [1] https://ci.ignite.apache.org/registerUser.html
> >
> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga  wrote:
> >
> >> Andrew,
> >>
> >> I have corrected PR according to your notes. Please review.
> >> What will be the next steps in order to merge in?
> >>
> >> Y.
> >>
> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov 
> >> пише:
> >>
> >> > Yuri,
> >> >
> >> > I've done with review.
> >> > No crime found, but trivial compatibility bug.
> >> >
> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga 
> wrote:
> >> >
> >> > > Denis,
> >> > >
> >> > > Thank you for your attention to this.
> >> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> >> > ticket
> >> > > is still pending review.
> >> > > Do we have a chance to move it forward somehow?
> >> > >
> >> > > BR,
> >> > > Yuriy Shuliha
> >> > >
> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> >> > >
> >> > > > Yuriy,
> >> > > >
> >> > > > I've seen you opening a pull-request with the first changes:
> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> >> > > >
> >> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
> >> > > >
> >> > > > -
> >> > > > Denis
> >> > > >
> >> > > >
> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> vololo...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Yuriy,
> >> > > > >
> >> > > > > Thank you for providing details! Quite interesting.
> >> > > > >
> >> > > > > Yes, we already have support of distributed limit and merging
> >> sorted
> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> >> > > > > MergeStreamIterator are used for merging sorted streams.
> >> > > > >
> >> > > > > Could you please also clarify about score/relevance? Is it
> >> provided
> >> > by
> >> > > > > Lucene engine for each query result? I am thinking how to do
> >> sorted
> >> > > > > merge properly in this case.
> >> > > > >
> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga  >:
> >> > > > > >
> >> > > > > > Ivan,
> >> > > > > >
> >> > > > > > Thank you for interesting question!
> >> > > > > >
> >> > > > > > Text searches (or full text searches) are mostly
> human-oriented.
> >> > And
> >> > > > the
> >> > > > > > point of user's interest is topmost part of response.
> >> > > > > > Then user can read it, evaluate and use the given records for
> >> > further
> >> > > > > > purposes.
> >> > > > > >
> >> > > > > > Particularly in our case, we use Ignite for operations with
> >> > financial
> >> > > > > data,
> >> > > > > > and there lots of text stuff like assets names, fin.
> >> instruments,
> >> > > > > companies
> >> > > > > > etc.
> >> > > > > > In order to operate with this quickly and reliably, users used
> >> to
> >> > > work
> >> > > > > with
> >> > > > > > text search, type-ahead completions, suggestions.
> >> > > > > >
> >> > > > > > For this purposes we are indexing particular string data in
> >> > separate
> >> > > > > caches.
> >> > > > > >
> >> > > > > > Sorting capabilities and response size limitations are very
> >> > important
> >> > > > > > there. As our API have to provide most relevant information in
> >> view
> >> > > of
> >> > > > > > limited size.
> >> > > > > >
> >> > > > > > Now let me comment some Ignite/Lucene perspective.
> >> > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> >> > > *already
> >> > > > > > sorted by *score *(relevance). So most relevant documents are
> on
> >> > the
> >> > > > top.
> >> > > > > > And currently distributed queries responses from different
> nodes
> >> > are
> >> > > > > merged
> >> > > > > > into final query cursor queue in arbitrary way.
> >> > > > > > So in fact we already have the score order ruined here. Also
> >> Ignite
> >> > > > > > requests all possible documents from Lucene that is redundant
> >> and
> >> > not
> >> > > > > good
> >> > > > > > for performance.
> >> > > > > >
> >> > > > > > I'm implementing *limit* parameter to be part of *TextQuery
> *and
> >> > have
> >> > > > to
> >> > > > > > notice that we still have to add sorting for text queries
> >> > processing
> >> > > in
> >> > > > > > order to have applicable results.
> >> > > > > >
> >> > > > > > *Limit* parameter itself should improve the part of issues
> from
> >> > > above,
> 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-17 Thread Yuriy Shuliga
  Andrey,

Per you request, I created ticket
https://issues.apache.org/jira/browse/IGNITE-12291   linked to
https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189

Could you please proceed with PR merge ?

BR,
Yuriy Shuliha

ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov  пише:

> Hi Yuri,
>
> To get access to TC Bot you should register as TeamCity user [1], if you
> didn't do this already.
> Then you will be able to authorize on Ignite TC Bot page with same
> credentials.
>
> [1] https://ci.ignite.apache.org/registerUser.html
>
> On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga  wrote:
>
>> Andrew,
>>
>> I have corrected PR according to your notes. Please review.
>> What will be the next steps in order to merge in?
>>
>> Y.
>>
>> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov 
>> пише:
>>
>> > Yuri,
>> >
>> > I've done with review.
>> > No crime found, but trivial compatibility bug.
>> >
>> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
>> >
>> > > Denis,
>> > >
>> > > Thank you for your attention to this.
>> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
>> > ticket
>> > > is still pending review.
>> > > Do we have a chance to move it forward somehow?
>> > >
>> > > BR,
>> > > Yuriy Shuliha
>> > >
>> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
>> > >
>> > > > Yuriy,
>> > > >
>> > > > I've seen you opening a pull-request with the first changes:
>> > > > https://issues.apache.org/jira/browse/IGNITE-12189
>> > > >
>> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
>> > > >
>> > > > -
>> > > > Denis
>> > > >
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
>> > > wrote:
>> > > >
>> > > > > Yuriy,
>> > > > >
>> > > > > Thank you for providing details! Quite interesting.
>> > > > >
>> > > > > Yes, we already have support of distributed limit and merging
>> sorted
>> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
>> > > > > MergeStreamIterator are used for merging sorted streams.
>> > > > >
>> > > > > Could you please also clarify about score/relevance? Is it
>> provided
>> > by
>> > > > > Lucene engine for each query result? I am thinking how to do
>> sorted
>> > > > > merge properly in this case.
>> > > > >
>> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
>> > > > > >
>> > > > > > Ivan,
>> > > > > >
>> > > > > > Thank you for interesting question!
>> > > > > >
>> > > > > > Text searches (or full text searches) are mostly human-oriented.
>> > And
>> > > > the
>> > > > > > point of user's interest is topmost part of response.
>> > > > > > Then user can read it, evaluate and use the given records for
>> > further
>> > > > > > purposes.
>> > > > > >
>> > > > > > Particularly in our case, we use Ignite for operations with
>> > financial
>> > > > > data,
>> > > > > > and there lots of text stuff like assets names, fin.
>> instruments,
>> > > > > companies
>> > > > > > etc.
>> > > > > > In order to operate with this quickly and reliably, users used
>> to
>> > > work
>> > > > > with
>> > > > > > text search, type-ahead completions, suggestions.
>> > > > > >
>> > > > > > For this purposes we are indexing particular string data in
>> > separate
>> > > > > caches.
>> > > > > >
>> > > > > > Sorting capabilities and response size limitations are very
>> > important
>> > > > > > there. As our API have to provide most relevant information in
>> view
>> > > of
>> > > > > > limited size.
>> > > > > >
>> > > > > > Now let me comment some Ignite/Lucene perspective.
>> > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
>> > > *already
>> > > > > > sorted by *score *(relevance). So most relevant documents are on
>> > the
>> > > > top.
>> > > > > > And currently distributed queries responses from different nodes
>> > are
>> > > > > merged
>> > > > > > into final query cursor queue in arbitrary way.
>> > > > > > So in fact we already have the score order ruined here. Also
>> Ignite
>> > > > > > requests all possible documents from Lucene that is redundant
>> and
>> > not
>> > > > > good
>> > > > > > for performance.
>> > > > > >
>> > > > > > I'm implementing *limit* parameter to be part of *TextQuery *and
>> > have
>> > > > to
>> > > > > > notice that we still have to add sorting for text queries
>> > processing
>> > > in
>> > > > > > order to have applicable results.
>> > > > > >
>> > > > > > *Limit* parameter itself should improve the part of issues from
>> > > above,
>> > > > > but
>> > > > > > definitely, sorting by document score at least  should be
>> > implemented
>> > > > > along
>> > > > > > with limit.
>> > > > > >
>> > > > > > This is a pretty short commentary if you still have any
>> questions,
>> > > > please
>> > > > > > ask, do not hesitate)
>> > > > > >
>> > > > > > BR,
>> > > > > > Yuriy Shuliha
>> > > > > >
>> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван 
>> пише:
>> > > > > >
>> > > > > > > Yuriy,
>> > > > > > >
>> > > > > > > Greatly appreciate your interest.
>> > > > > > >
>> > > > > > > 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Andrey Mashenkov
Yuriy,

Just FYI we have a review checklist [1], coding guidelines [2].
To test a PR someone can use TeamCity [3] or TeamCityBot project [4].

The last way (using TCBot) makes test validation much easier and do not
bother with flacky tests.
Long story short you can trigger tests for the PR from Bot page and then
make Bot attach these results to a Jira ticket if you found results
acceptable.

So, next step is to run tests and chek if all is ok.

[1] https://cwiki.apache.org/confluence/display/IGNITE/Review+Checklist
[2] https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines
[3] https://ci.ignite.apache.org/
[4] https://mtcga.gridgain.com/



On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga  wrote:

> Andrew,
>
> I have corrected PR according to your notes. Please review.
> What will be the next steps in order to merge in?
>
> Y.
>
> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov 
> пише:
>
> > Yuri,
> >
> > I've done with review.
> > No crime found, but trivial compatibility bug.
> >
> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
> >
> > > Denis,
> > >
> > > Thank you for your attention to this.
> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> > ticket
> > > is still pending review.
> > > Do we have a chance to move it forward somehow?
> > >
> > > BR,
> > > Yuriy Shuliha
> > >
> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> > >
> > > > Yuriy,
> > > >
> > > > I've seen you opening a pull-request with the first changes:
> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> > > >
> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > > wrote:
> > > >
> > > > > Yuriy,
> > > > >
> > > > > Thank you for providing details! Quite interesting.
> > > > >
> > > > > Yes, we already have support of distributed limit and merging
> sorted
> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > > MergeStreamIterator are used for merging sorted streams.
> > > > >
> > > > > Could you please also clarify about score/relevance? Is it provided
> > by
> > > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > > merge properly in this case.
> > > > >
> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > > >
> > > > > > Ivan,
> > > > > >
> > > > > > Thank you for interesting question!
> > > > > >
> > > > > > Text searches (or full text searches) are mostly human-oriented.
> > And
> > > > the
> > > > > > point of user's interest is topmost part of response.
> > > > > > Then user can read it, evaluate and use the given records for
> > further
> > > > > > purposes.
> > > > > >
> > > > > > Particularly in our case, we use Ignite for operations with
> > financial
> > > > > data,
> > > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > > companies
> > > > > > etc.
> > > > > > In order to operate with this quickly and reliably, users used to
> > > work
> > > > > with
> > > > > > text search, type-ahead completions, suggestions.
> > > > > >
> > > > > > For this purposes we are indexing particular string data in
> > separate
> > > > > caches.
> > > > > >
> > > > > > Sorting capabilities and response size limitations are very
> > important
> > > > > > there. As our API have to provide most relevant information in
> view
> > > of
> > > > > > limited size.
> > > > > >
> > > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > > *already
> > > > > > sorted by *score *(relevance). So most relevant documents are on
> > the
> > > > top.
> > > > > > And currently distributed queries responses from different nodes
> > are
> > > > > merged
> > > > > > into final query cursor queue in arbitrary way.
> > > > > > So in fact we already have the score order ruined here. Also
> Ignite
> > > > > > requests all possible documents from Lucene that is redundant and
> > not
> > > > > good
> > > > > > for performance.
> > > > > >
> > > > > > I'm implementing *limit* parameter to be part of *TextQuery *and
> > have
> > > > to
> > > > > > notice that we still have to add sorting for text queries
> > processing
> > > in
> > > > > > order to have applicable results.
> > > > > >
> > > > > > *Limit* parameter itself should improve the part of issues from
> > > above,
> > > > > but
> > > > > > definitely, sorting by document score at least  should be
> > implemented
> > > > > along
> > > > > > with limit.
> > > > > >
> > > > > > This is a pretty short commentary if you still have any
> questions,
> > > > please
> > > > > > ask, do not hesitate)
> > > > > >
> > > > > > BR,
> > > > > > Yuriy Shuliha
> > > > > >
> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван 
> пише:
> > > > > >
> > > > > > > Yuriy,
> > > > > > >
> > > > > > > Greatly 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Yuriy Shuliga
Andrew,

I have corrected PR according to your notes. Please review.
What will be the next steps in order to merge in?

Y.

чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov  пише:

> Yuri,
>
> I've done with review.
> No crime found, but trivial compatibility bug.
>
> On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
>
> > Denis,
> >
> > Thank you for your attention to this.
> > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> ticket
> > is still pending review.
> > Do we have a chance to move it forward somehow?
> >
> > BR,
> > Yuriy Shuliha
> >
> > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> >
> > > Yuriy,
> > >
> > > I've seen you opening a pull-request with the first changes:
> > > https://issues.apache.org/jira/browse/IGNITE-12189
> > >
> > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > wrote:
> > >
> > > > Yuriy,
> > > >
> > > > Thank you for providing details! Quite interesting.
> > > >
> > > > Yes, we already have support of distributed limit and merging sorted
> > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > MergeStreamIterator are used for merging sorted streams.
> > > >
> > > > Could you please also clarify about score/relevance? Is it provided
> by
> > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > merge properly in this case.
> > > >
> > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > >
> > > > > Ivan,
> > > > >
> > > > > Thank you for interesting question!
> > > > >
> > > > > Text searches (or full text searches) are mostly human-oriented.
> And
> > > the
> > > > > point of user's interest is topmost part of response.
> > > > > Then user can read it, evaluate and use the given records for
> further
> > > > > purposes.
> > > > >
> > > > > Particularly in our case, we use Ignite for operations with
> financial
> > > > data,
> > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > companies
> > > > > etc.
> > > > > In order to operate with this quickly and reliably, users used to
> > work
> > > > with
> > > > > text search, type-ahead completions, suggestions.
> > > > >
> > > > > For this purposes we are indexing particular string data in
> separate
> > > > caches.
> > > > >
> > > > > Sorting capabilities and response size limitations are very
> important
> > > > > there. As our API have to provide most relevant information in view
> > of
> > > > > limited size.
> > > > >
> > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > *already
> > > > > sorted by *score *(relevance). So most relevant documents are on
> the
> > > top.
> > > > > And currently distributed queries responses from different nodes
> are
> > > > merged
> > > > > into final query cursor queue in arbitrary way.
> > > > > So in fact we already have the score order ruined here. Also Ignite
> > > > > requests all possible documents from Lucene that is redundant and
> not
> > > > good
> > > > > for performance.
> > > > >
> > > > > I'm implementing *limit* parameter to be part of *TextQuery *and
> have
> > > to
> > > > > notice that we still have to add sorting for text queries
> processing
> > in
> > > > > order to have applicable results.
> > > > >
> > > > > *Limit* parameter itself should improve the part of issues from
> > above,
> > > > but
> > > > > definitely, sorting by document score at least  should be
> implemented
> > > > along
> > > > > with limit.
> > > > >
> > > > > This is a pretty short commentary if you still have any questions,
> > > please
> > > > > ask, do not hesitate)
> > > > >
> > > > > BR,
> > > > > Yuriy Shuliha
> > > > >
> > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> > > > >
> > > > > > Yuriy,
> > > > > >
> > > > > > Greatly appreciate your interest.
> > > > > >
> > > > > > Could you please elaborate a little bit about sorting? What tasks
> > > does
> > > > > > it help to solve and how? It would be great to provide an
> example.
> > > > > >
> > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > > > > alexey.scherbak...@gmail.com>:
> > > > > > >
> > > > > > > Denis,
> > > > > > >
> > > > > > > I like the idea of throwing an exception for enabled text
> queries
> > > on
> > > > > > > persistent caches.
> > > > > > >
> > > > > > > Also I'm fine with proposed limit for unsorted searches.
> > > > > > >
> > > > > > > Yury, please proceed with ticket creation.
> > > > > > >
> > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > I see nothing wrong with Yury's proposal in regards full-text
> > > > search
> > > > > > API
> > > > > > > > evolution as long as Yury is ready to push it forward.
> > > > > > > >
> > > > > > > > As for the in-memory mode only, it makes total sense for
> > > in-memory
> > > > data
> > > > > > 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Ivan Pavlukhin
Yuriy,

Thank you, fine with it.

пт, 4 окт. 2019 г. в 11:01, Yuriy Shuliga :
>
> Ivan,
>
> Yes, your observation is correct.
>
> This behavior lasts from the very beginning when Lucene indexing was
> implemented for distributed queries.
> Implementation of the *limit* solves the problem of redundant response
> size. Without it *ALL* off the records are fetched each time; that is not
> good, especially for loose patterns.
> In order to solve relevance issue correct sorting should be implemented.
>
> Y.
>
> пт, 4 жовт. 2019 о 10:45 Ivan Pavlukhin  пише:
>
> > Yuriy,
> >
> > Am I getting it right that in your PR if we have a limit N than
> > returned items (at most N) will not be strictly the most relevant
> > ones? E.g. if one node returned N items faster than others but with
> > not so good relevance?
> >
> > чт, 3 окт. 2019 г. в 17:47, Andrey Mashenkov :
> > >
> > > Yuri,
> > >
> > > I've done with review.
> > > No crime found, but trivial compatibility bug.
> > >
> > > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
> > >
> > > > Denis,
> > > >
> > > > Thank you for your attention to this.
> > > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> > ticket
> > > > is still pending review.
> > > > Do we have a chance to move it forward somehow?
> > > >
> > > > BR,
> > > > Yuriy Shuliha
> > > >
> > > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> > > >
> > > > > Yuriy,
> > > > >
> > > > > I've seen you opening a pull-request with the first changes:
> > > > > https://issues.apache.org/jira/browse/IGNITE-12189
> > > > >
> > > > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > > > wrote:
> > > > >
> > > > > > Yuriy,
> > > > > >
> > > > > > Thank you for providing details! Quite interesting.
> > > > > >
> > > > > > Yes, we already have support of distributed limit and merging
> > sorted
> > > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > > > MergeStreamIterator are used for merging sorted streams.
> > > > > >
> > > > > > Could you please also clarify about score/relevance? Is it
> > provided by
> > > > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > > > merge properly in this case.
> > > > > >
> > > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > > > >
> > > > > > > Ivan,
> > > > > > >
> > > > > > > Thank you for interesting question!
> > > > > > >
> > > > > > > Text searches (or full text searches) are mostly human-oriented.
> > And
> > > > > the
> > > > > > > point of user's interest is topmost part of response.
> > > > > > > Then user can read it, evaluate and use the given records for
> > further
> > > > > > > purposes.
> > > > > > >
> > > > > > > Particularly in our case, we use Ignite for operations with
> > financial
> > > > > > data,
> > > > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > > > companies
> > > > > > > etc.
> > > > > > > In order to operate with this quickly and reliably, users used to
> > > > work
> > > > > > with
> > > > > > > text search, type-ahead completions, suggestions.
> > > > > > >
> > > > > > > For this purposes we are indexing particular string data in
> > separate
> > > > > > caches.
> > > > > > >
> > > > > > > Sorting capabilities and response size limitations are very
> > important
> > > > > > > there. As our API have to provide most relevant information in
> > view
> > > > of
> > > > > > > limited size.
> > > > > > >
> > > > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > > > *already
> > > > > > > sorted by *score *(relevance). So most relevant documents are on
> > the
> > > > > top.
> > > > > > > And currently distributed queries responses from different nodes
> > are
> > > > > > merged
> > > > > > > into final query cursor queue in arbitrary way.
> > > > > > > So in fact we already have the score order ruined here. Also
> > Ignite
> > > > > > > requests all possible documents from Lucene that is redundant
> > and not
> > > > > > good
> > > > > > > for performance.
> > > > > > >
> > > > > > > I'm implementing *limit* parameter to be part of *TextQuery *and
> > have
> > > > > to
> > > > > > > notice that we still have to add sorting for text queries
> > processing
> > > > in
> > > > > > > order to have applicable results.
> > > > > > >
> > > > > > > *Limit* parameter itself should improve the part of issues from
> > > > above,
> > > > > > but
> > > > > > > definitely, sorting by document score at least  should be
> > implemented
> > > > > > along
> > > > > > > with limit.
> > > > > > >
> > > > > > > This is a pretty short commentary if you still have any
> > questions,
> > > > > please
> > > > > > > ask, do not hesitate)
> > > > > > >
> > > > > > > BR,
> > > > > > > Yuriy Shuliha
> > > > > > >
> > > > > > > чт, 19 вер. 2019 о 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Yuriy Shuliga
Ivan,

Yes, your observation is correct.

This behavior lasts from the very beginning when Lucene indexing was
implemented for distributed queries.
Implementation of the *limit* solves the problem of redundant response
size. Without it *ALL* off the records are fetched each time; that is not
good, especially for loose patterns.
In order to solve relevance issue correct sorting should be implemented.

Y.

пт, 4 жовт. 2019 о 10:45 Ivan Pavlukhin  пише:

> Yuriy,
>
> Am I getting it right that in your PR if we have a limit N than
> returned items (at most N) will not be strictly the most relevant
> ones? E.g. if one node returned N items faster than others but with
> not so good relevance?
>
> чт, 3 окт. 2019 г. в 17:47, Andrey Mashenkov :
> >
> > Yuri,
> >
> > I've done with review.
> > No crime found, but trivial compatibility bug.
> >
> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
> >
> > > Denis,
> > >
> > > Thank you for your attention to this.
> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> ticket
> > > is still pending review.
> > > Do we have a chance to move it forward somehow?
> > >
> > > BR,
> > > Yuriy Shuliha
> > >
> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> > >
> > > > Yuriy,
> > > >
> > > > I've seen you opening a pull-request with the first changes:
> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> > > >
> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > > wrote:
> > > >
> > > > > Yuriy,
> > > > >
> > > > > Thank you for providing details! Quite interesting.
> > > > >
> > > > > Yes, we already have support of distributed limit and merging
> sorted
> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > > MergeStreamIterator are used for merging sorted streams.
> > > > >
> > > > > Could you please also clarify about score/relevance? Is it
> provided by
> > > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > > merge properly in this case.
> > > > >
> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > > >
> > > > > > Ivan,
> > > > > >
> > > > > > Thank you for interesting question!
> > > > > >
> > > > > > Text searches (or full text searches) are mostly human-oriented.
> And
> > > > the
> > > > > > point of user's interest is topmost part of response.
> > > > > > Then user can read it, evaluate and use the given records for
> further
> > > > > > purposes.
> > > > > >
> > > > > > Particularly in our case, we use Ignite for operations with
> financial
> > > > > data,
> > > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > > companies
> > > > > > etc.
> > > > > > In order to operate with this quickly and reliably, users used to
> > > work
> > > > > with
> > > > > > text search, type-ahead completions, suggestions.
> > > > > >
> > > > > > For this purposes we are indexing particular string data in
> separate
> > > > > caches.
> > > > > >
> > > > > > Sorting capabilities and response size limitations are very
> important
> > > > > > there. As our API have to provide most relevant information in
> view
> > > of
> > > > > > limited size.
> > > > > >
> > > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > > *already
> > > > > > sorted by *score *(relevance). So most relevant documents are on
> the
> > > > top.
> > > > > > And currently distributed queries responses from different nodes
> are
> > > > > merged
> > > > > > into final query cursor queue in arbitrary way.
> > > > > > So in fact we already have the score order ruined here. Also
> Ignite
> > > > > > requests all possible documents from Lucene that is redundant
> and not
> > > > > good
> > > > > > for performance.
> > > > > >
> > > > > > I'm implementing *limit* parameter to be part of *TextQuery *and
> have
> > > > to
> > > > > > notice that we still have to add sorting for text queries
> processing
> > > in
> > > > > > order to have applicable results.
> > > > > >
> > > > > > *Limit* parameter itself should improve the part of issues from
> > > above,
> > > > > but
> > > > > > definitely, sorting by document score at least  should be
> implemented
> > > > > along
> > > > > > with limit.
> > > > > >
> > > > > > This is a pretty short commentary if you still have any
> questions,
> > > > please
> > > > > > ask, do not hesitate)
> > > > > >
> > > > > > BR,
> > > > > > Yuriy Shuliha
> > > > > >
> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван 
> пише:
> > > > > >
> > > > > > > Yuriy,
> > > > > > >
> > > > > > > Greatly appreciate your interest.
> > > > > > >
> > > > > > > Could you please elaborate a little bit about sorting? What
> tasks
> > > > does
> > > > > > > it help to solve and how? It would be great to provide an
> example.
> > > > > > >
> > > > > > > ср, 18 сент. 2019 г. в 09:39, 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Ivan Pavlukhin
Yuriy,

Am I getting it right that in your PR if we have a limit N than
returned items (at most N) will not be strictly the most relevant
ones? E.g. if one node returned N items faster than others but with
not so good relevance?

чт, 3 окт. 2019 г. в 17:47, Andrey Mashenkov :
>
> Yuri,
>
> I've done with review.
> No crime found, but trivial compatibility bug.
>
> On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
>
> > Denis,
> >
> > Thank you for your attention to this.
> > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189 ticket
> > is still pending review.
> > Do we have a chance to move it forward somehow?
> >
> > BR,
> > Yuriy Shuliha
> >
> > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> >
> > > Yuriy,
> > >
> > > I've seen you opening a pull-request with the first changes:
> > > https://issues.apache.org/jira/browse/IGNITE-12189
> > >
> > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > wrote:
> > >
> > > > Yuriy,
> > > >
> > > > Thank you for providing details! Quite interesting.
> > > >
> > > > Yes, we already have support of distributed limit and merging sorted
> > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > MergeStreamIterator are used for merging sorted streams.
> > > >
> > > > Could you please also clarify about score/relevance? Is it provided by
> > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > merge properly in this case.
> > > >
> > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > >
> > > > > Ivan,
> > > > >
> > > > > Thank you for interesting question!
> > > > >
> > > > > Text searches (or full text searches) are mostly human-oriented. And
> > > the
> > > > > point of user's interest is topmost part of response.
> > > > > Then user can read it, evaluate and use the given records for further
> > > > > purposes.
> > > > >
> > > > > Particularly in our case, we use Ignite for operations with financial
> > > > data,
> > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > companies
> > > > > etc.
> > > > > In order to operate with this quickly and reliably, users used to
> > work
> > > > with
> > > > > text search, type-ahead completions, suggestions.
> > > > >
> > > > > For this purposes we are indexing particular string data in separate
> > > > caches.
> > > > >
> > > > > Sorting capabilities and response size limitations are very important
> > > > > there. As our API have to provide most relevant information in view
> > of
> > > > > limited size.
> > > > >
> > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > *already
> > > > > sorted by *score *(relevance). So most relevant documents are on the
> > > top.
> > > > > And currently distributed queries responses from different nodes are
> > > > merged
> > > > > into final query cursor queue in arbitrary way.
> > > > > So in fact we already have the score order ruined here. Also Ignite
> > > > > requests all possible documents from Lucene that is redundant and not
> > > > good
> > > > > for performance.
> > > > >
> > > > > I'm implementing *limit* parameter to be part of *TextQuery *and have
> > > to
> > > > > notice that we still have to add sorting for text queries processing
> > in
> > > > > order to have applicable results.
> > > > >
> > > > > *Limit* parameter itself should improve the part of issues from
> > above,
> > > > but
> > > > > definitely, sorting by document score at least  should be implemented
> > > > along
> > > > > with limit.
> > > > >
> > > > > This is a pretty short commentary if you still have any questions,
> > > please
> > > > > ask, do not hesitate)
> > > > >
> > > > > BR,
> > > > > Yuriy Shuliha
> > > > >
> > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> > > > >
> > > > > > Yuriy,
> > > > > >
> > > > > > Greatly appreciate your interest.
> > > > > >
> > > > > > Could you please elaborate a little bit about sorting? What tasks
> > > does
> > > > > > it help to solve and how? It would be great to provide an example.
> > > > > >
> > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > > > > alexey.scherbak...@gmail.com>:
> > > > > > >
> > > > > > > Denis,
> > > > > > >
> > > > > > > I like the idea of throwing an exception for enabled text queries
> > > on
> > > > > > > persistent caches.
> > > > > > >
> > > > > > > Also I'm fine with proposed limit for unsorted searches.
> > > > > > >
> > > > > > > Yury, please proceed with ticket creation.
> > > > > > >
> > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > I see nothing wrong with Yury's proposal in regards full-text
> > > > search
> > > > > > API
> > > > > > > > evolution as long as Yury is ready to push it forward.
> > > > > > > >
> > > > > > > > As for the in-memory 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-03 Thread Andrey Mashenkov
Yuri,

I've done with review.
No crime found, but trivial compatibility bug.

On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:

> Denis,
>
> Thank you for your attention to this.
> as for now, the https://issues.apache.org/jira/browse/IGNITE-12189 ticket
> is still pending review.
> Do we have a chance to move it forward somehow?
>
> BR,
> Yuriy Shuliha
>
> пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
>
> > Yuriy,
> >
> > I've seen you opening a pull-request with the first changes:
> > https://issues.apache.org/jira/browse/IGNITE-12189
> >
> > Alex Scherbakov and Ivan are you the right guys to do the review?
> >
> > -
> > Denis
> >
> >
> > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> wrote:
> >
> > > Yuriy,
> > >
> > > Thank you for providing details! Quite interesting.
> > >
> > > Yes, we already have support of distributed limit and merging sorted
> > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > MergeStreamIterator are used for merging sorted streams.
> > >
> > > Could you please also clarify about score/relevance? Is it provided by
> > > Lucene engine for each query result? I am thinking how to do sorted
> > > merge properly in this case.
> > >
> > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > >
> > > > Ivan,
> > > >
> > > > Thank you for interesting question!
> > > >
> > > > Text searches (or full text searches) are mostly human-oriented. And
> > the
> > > > point of user's interest is topmost part of response.
> > > > Then user can read it, evaluate and use the given records for further
> > > > purposes.
> > > >
> > > > Particularly in our case, we use Ignite for operations with financial
> > > data,
> > > > and there lots of text stuff like assets names, fin. instruments,
> > > companies
> > > > etc.
> > > > In order to operate with this quickly and reliably, users used to
> work
> > > with
> > > > text search, type-ahead completions, suggestions.
> > > >
> > > > For this purposes we are indexing particular string data in separate
> > > caches.
> > > >
> > > > Sorting capabilities and response size limitations are very important
> > > > there. As our API have to provide most relevant information in view
> of
> > > > limited size.
> > > >
> > > > Now let me comment some Ignite/Lucene perspective.
> > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> *already
> > > > sorted by *score *(relevance). So most relevant documents are on the
> > top.
> > > > And currently distributed queries responses from different nodes are
> > > merged
> > > > into final query cursor queue in arbitrary way.
> > > > So in fact we already have the score order ruined here. Also Ignite
> > > > requests all possible documents from Lucene that is redundant and not
> > > good
> > > > for performance.
> > > >
> > > > I'm implementing *limit* parameter to be part of *TextQuery *and have
> > to
> > > > notice that we still have to add sorting for text queries processing
> in
> > > > order to have applicable results.
> > > >
> > > > *Limit* parameter itself should improve the part of issues from
> above,
> > > but
> > > > definitely, sorting by document score at least  should be implemented
> > > along
> > > > with limit.
> > > >
> > > > This is a pretty short commentary if you still have any questions,
> > please
> > > > ask, do not hesitate)
> > > >
> > > > BR,
> > > > Yuriy Shuliha
> > > >
> > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> > > >
> > > > > Yuriy,
> > > > >
> > > > > Greatly appreciate your interest.
> > > > >
> > > > > Could you please elaborate a little bit about sorting? What tasks
> > does
> > > > > it help to solve and how? It would be great to provide an example.
> > > > >
> > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > > > alexey.scherbak...@gmail.com>:
> > > > > >
> > > > > > Denis,
> > > > > >
> > > > > > I like the idea of throwing an exception for enabled text queries
> > on
> > > > > > persistent caches.
> > > > > >
> > > > > > Also I'm fine with proposed limit for unsorted searches.
> > > > > >
> > > > > > Yury, please proceed with ticket creation.
> > > > > >
> > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > I see nothing wrong with Yury's proposal in regards full-text
> > > search
> > > > > API
> > > > > > > evolution as long as Yury is ready to push it forward.
> > > > > > >
> > > > > > > As for the in-memory mode only, it makes total sense for
> > in-memory
> > > data
> > > > > > > grid deployments when Ignite caches data of an underlying DB
> like
> > > > > Postgres.
> > > > > > > As part of the changes, I would simply throw an exception (by
> > > default)
> > > > > if
> > > > > > > the one attempts to use text indices with the native
> persistence
> > > > > enabled.
> > > > > > > If the person is ready to live with that limitation that an
> > > explicit
> > > > > > > configuration change is needed to come around the exception.
> > > > > > >
> > > > > > > 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-03 Thread Yuriy Shuliga
Denis,

Thank you for your attention to this.
as for now, the https://issues.apache.org/jira/browse/IGNITE-12189 ticket
is still pending review.
Do we have a chance to move it forward somehow?

BR,
Yuriy Shuliha

пн, 30 вер. 2019 о 23:35 Denis Magda  пише:

> Yuriy,
>
> I've seen you opening a pull-request with the first changes:
> https://issues.apache.org/jira/browse/IGNITE-12189
>
> Alex Scherbakov and Ivan are you the right guys to do the review?
>
> -
> Denis
>
>
> On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван  wrote:
>
> > Yuriy,
> >
> > Thank you for providing details! Quite interesting.
> >
> > Yes, we already have support of distributed limit and merging sorted
> > subresults for SQL queries. E.g. ReduceIndexSorted and
> > MergeStreamIterator are used for merging sorted streams.
> >
> > Could you please also clarify about score/relevance? Is it provided by
> > Lucene engine for each query result? I am thinking how to do sorted
> > merge properly in this case.
> >
> > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > >
> > > Ivan,
> > >
> > > Thank you for interesting question!
> > >
> > > Text searches (or full text searches) are mostly human-oriented. And
> the
> > > point of user's interest is topmost part of response.
> > > Then user can read it, evaluate and use the given records for further
> > > purposes.
> > >
> > > Particularly in our case, we use Ignite for operations with financial
> > data,
> > > and there lots of text stuff like assets names, fin. instruments,
> > companies
> > > etc.
> > > In order to operate with this quickly and reliably, users used to work
> > with
> > > text search, type-ahead completions, suggestions.
> > >
> > > For this purposes we are indexing particular string data in separate
> > caches.
> > >
> > > Sorting capabilities and response size limitations are very important
> > > there. As our API have to provide most relevant information in view of
> > > limited size.
> > >
> > > Now let me comment some Ignite/Lucene perspective.
> > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> > > sorted by *score *(relevance). So most relevant documents are on the
> top.
> > > And currently distributed queries responses from different nodes are
> > merged
> > > into final query cursor queue in arbitrary way.
> > > So in fact we already have the score order ruined here. Also Ignite
> > > requests all possible documents from Lucene that is redundant and not
> > good
> > > for performance.
> > >
> > > I'm implementing *limit* parameter to be part of *TextQuery *and have
> to
> > > notice that we still have to add sorting for text queries processing in
> > > order to have applicable results.
> > >
> > > *Limit* parameter itself should improve the part of issues from above,
> > but
> > > definitely, sorting by document score at least  should be implemented
> > along
> > > with limit.
> > >
> > > This is a pretty short commentary if you still have any questions,
> please
> > > ask, do not hesitate)
> > >
> > > BR,
> > > Yuriy Shuliha
> > >
> > > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> > >
> > > > Yuriy,
> > > >
> > > > Greatly appreciate your interest.
> > > >
> > > > Could you please elaborate a little bit about sorting? What tasks
> does
> > > > it help to solve and how? It would be great to provide an example.
> > > >
> > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > > alexey.scherbak...@gmail.com>:
> > > > >
> > > > > Denis,
> > > > >
> > > > > I like the idea of throwing an exception for enabled text queries
> on
> > > > > persistent caches.
> > > > >
> > > > > Also I'm fine with proposed limit for unsorted searches.
> > > > >
> > > > > Yury, please proceed with ticket creation.
> > > > >
> > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I see nothing wrong with Yury's proposal in regards full-text
> > search
> > > > API
> > > > > > evolution as long as Yury is ready to push it forward.
> > > > > >
> > > > > > As for the in-memory mode only, it makes total sense for
> in-memory
> > data
> > > > > > grid deployments when Ignite caches data of an underlying DB like
> > > > Postgres.
> > > > > > As part of the changes, I would simply throw an exception (by
> > default)
> > > > if
> > > > > > the one attempts to use text indices with the native persistence
> > > > enabled.
> > > > > > If the person is ready to live with that limitation that an
> > explicit
> > > > > > configuration change is needed to come around the exception.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > >
> > > > > > -
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga  >
> > > > wrote:
> > > > > >
> > > > > > > Hello to all again,
> > > > > > >
> > > > > > > Thank you for important comments and notes given below!
> > > > > > >
> > > > > > > Let me answer and continue the discussion.
> > > > > > >
> > > > > > > (I) Overall needs in Lucene 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-03 Thread Yuriy Shuliga
Ivan,

Regarding you question about Lucene search response.
  *IndexSearcher.search()* always returns result  sorted  at least by *score
*(*relevance*) or by defined *Sort *which includes ordering fields and
rules.
This means than even for now *GridLunceneIndex* result will be incorrect in
case of distributed queries as they are merged in arbitrary way.
Under the hood *ScoreDoc* object is used to fetch desired document/record
and this class contains *docId* and *score*. So small wrapper with
Comparable interface may solve merging of ordered results.

BR,
Yuriy Shuliha


пт, 27 вер. 2019 о 18:48 Павлухин Иван  пише:

> Yuriy,
>
> Thank you for providing details! Quite interesting.
>
> Yes, we already have support of distributed limit and merging sorted
> subresults for SQL queries. E.g. ReduceIndexSorted and
> MergeStreamIterator are used for merging sorted streams.
>
> Could you please also clarify about score/relevance? Is it provided by
> Lucene engine for each query result? I am thinking how to do sorted
> merge properly in this case.
>
> ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> >
> > Ivan,
> >
> > Thank you for interesting question!
> >
> > Text searches (or full text searches) are mostly human-oriented. And the
> > point of user's interest is topmost part of response.
> > Then user can read it, evaluate and use the given records for further
> > purposes.
> >
> > Particularly in our case, we use Ignite for operations with financial
> data,
> > and there lots of text stuff like assets names, fin. instruments,
> companies
> > etc.
> > In order to operate with this quickly and reliably, users used to work
> with
> > text search, type-ahead completions, suggestions.
> >
> > For this purposes we are indexing particular string data in separate
> caches.
> >
> > Sorting capabilities and response size limitations are very important
> > there. As our API have to provide most relevant information in view of
> > limited size.
> >
> > Now let me comment some Ignite/Lucene perspective.
> > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> > sorted by *score *(relevance). So most relevant documents are on the top.
> > And currently distributed queries responses from different nodes are
> merged
> > into final query cursor queue in arbitrary way.
> > So in fact we already have the score order ruined here. Also Ignite
> > requests all possible documents from Lucene that is redundant and not
> good
> > for performance.
> >
> > I'm implementing *limit* parameter to be part of *TextQuery *and have to
> > notice that we still have to add sorting for text queries processing in
> > order to have applicable results.
> >
> > *Limit* parameter itself should improve the part of issues from above,
> but
> > definitely, sorting by document score at least  should be implemented
> along
> > with limit.
> >
> > This is a pretty short commentary if you still have any questions, please
> > ask, do not hesitate)
> >
> > BR,
> > Yuriy Shuliha
> >
> > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> >
> > > Yuriy,
> > >
> > > Greatly appreciate your interest.
> > >
> > > Could you please elaborate a little bit about sorting? What tasks does
> > > it help to solve and how? It would be great to provide an example.
> > >
> > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com>:
> > > >
> > > > Denis,
> > > >
> > > > I like the idea of throwing an exception for enabled text queries on
> > > > persistent caches.
> > > >
> > > > Also I'm fine with proposed limit for unsorted searches.
> > > >
> > > > Yury, please proceed with ticket creation.
> > > >
> > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > >
> > > > > Igniters,
> > > > >
> > > > > I see nothing wrong with Yury's proposal in regards full-text
> search
> > > API
> > > > > evolution as long as Yury is ready to push it forward.
> > > > >
> > > > > As for the in-memory mode only, it makes total sense for in-memory
> data
> > > > > grid deployments when Ignite caches data of an underlying DB like
> > > Postgres.
> > > > > As part of the changes, I would simply throw an exception (by
> default)
> > > if
> > > > > the one attempts to use text indices with the native persistence
> > > enabled.
> > > > > If the person is ready to live with that limitation that an
> explicit
> > > > > configuration change is needed to come around the exception.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga 
> > > wrote:
> > > > >
> > > > > > Hello to all again,
> > > > > >
> > > > > > Thank you for important comments and notes given below!
> > > > > >
> > > > > > Let me answer and continue the discussion.
> > > > > >
> > > > > > (I) Overall needs in Lucene indexing
> > > > > >
> > > > > > Alexei has referenced to
> > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > > > > absence of index persistence was declared 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-30 Thread Denis Magda
Yuriy,

I've seen you opening a pull-request with the first changes:
https://issues.apache.org/jira/browse/IGNITE-12189

Alex Scherbakov and Ivan are you the right guys to do the review?

-
Denis


On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван  wrote:

> Yuriy,
>
> Thank you for providing details! Quite interesting.
>
> Yes, we already have support of distributed limit and merging sorted
> subresults for SQL queries. E.g. ReduceIndexSorted and
> MergeStreamIterator are used for merging sorted streams.
>
> Could you please also clarify about score/relevance? Is it provided by
> Lucene engine for each query result? I am thinking how to do sorted
> merge properly in this case.
>
> ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> >
> > Ivan,
> >
> > Thank you for interesting question!
> >
> > Text searches (or full text searches) are mostly human-oriented. And the
> > point of user's interest is topmost part of response.
> > Then user can read it, evaluate and use the given records for further
> > purposes.
> >
> > Particularly in our case, we use Ignite for operations with financial
> data,
> > and there lots of text stuff like assets names, fin. instruments,
> companies
> > etc.
> > In order to operate with this quickly and reliably, users used to work
> with
> > text search, type-ahead completions, suggestions.
> >
> > For this purposes we are indexing particular string data in separate
> caches.
> >
> > Sorting capabilities and response size limitations are very important
> > there. As our API have to provide most relevant information in view of
> > limited size.
> >
> > Now let me comment some Ignite/Lucene perspective.
> > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> > sorted by *score *(relevance). So most relevant documents are on the top.
> > And currently distributed queries responses from different nodes are
> merged
> > into final query cursor queue in arbitrary way.
> > So in fact we already have the score order ruined here. Also Ignite
> > requests all possible documents from Lucene that is redundant and not
> good
> > for performance.
> >
> > I'm implementing *limit* parameter to be part of *TextQuery *and have to
> > notice that we still have to add sorting for text queries processing in
> > order to have applicable results.
> >
> > *Limit* parameter itself should improve the part of issues from above,
> but
> > definitely, sorting by document score at least  should be implemented
> along
> > with limit.
> >
> > This is a pretty short commentary if you still have any questions, please
> > ask, do not hesitate)
> >
> > BR,
> > Yuriy Shuliha
> >
> > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> >
> > > Yuriy,
> > >
> > > Greatly appreciate your interest.
> > >
> > > Could you please elaborate a little bit about sorting? What tasks does
> > > it help to solve and how? It would be great to provide an example.
> > >
> > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com>:
> > > >
> > > > Denis,
> > > >
> > > > I like the idea of throwing an exception for enabled text queries on
> > > > persistent caches.
> > > >
> > > > Also I'm fine with proposed limit for unsorted searches.
> > > >
> > > > Yury, please proceed with ticket creation.
> > > >
> > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > >
> > > > > Igniters,
> > > > >
> > > > > I see nothing wrong with Yury's proposal in regards full-text
> search
> > > API
> > > > > evolution as long as Yury is ready to push it forward.
> > > > >
> > > > > As for the in-memory mode only, it makes total sense for in-memory
> data
> > > > > grid deployments when Ignite caches data of an underlying DB like
> > > Postgres.
> > > > > As part of the changes, I would simply throw an exception (by
> default)
> > > if
> > > > > the one attempts to use text indices with the native persistence
> > > enabled.
> > > > > If the person is ready to live with that limitation that an
> explicit
> > > > > configuration change is needed to come around the exception.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga 
> > > wrote:
> > > > >
> > > > > > Hello to all again,
> > > > > >
> > > > > > Thank you for important comments and notes given below!
> > > > > >
> > > > > > Let me answer and continue the discussion.
> > > > > >
> > > > > > (I) Overall needs in Lucene indexing
> > > > > >
> > > > > > Alexei has referenced to
> > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > > > > absence of index persistence was declared as an obstacle to
> further
> > > > > > development.
> > > > > >
> > > > > > a) This ticket is already closed as not valid.b) There are
> definite
> > > needs
> > > > > > (and in our project as well) in just in-memory indexing of
> selected
> > > data.
> > > > > > We intend to use search capabilities for fetching limited amount
> of
> > > > > records
> > > > > > that 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-27 Thread Павлухин Иван
Yuriy,

Thank you for providing details! Quite interesting.

Yes, we already have support of distributed limit and merging sorted
subresults for SQL queries. E.g. ReduceIndexSorted and
MergeStreamIterator are used for merging sorted streams.

Could you please also clarify about score/relevance? Is it provided by
Lucene engine for each query result? I am thinking how to do sorted
merge properly in this case.

ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
>
> Ivan,
>
> Thank you for interesting question!
>
> Text searches (or full text searches) are mostly human-oriented. And the
> point of user's interest is topmost part of response.
> Then user can read it, evaluate and use the given records for further
> purposes.
>
> Particularly in our case, we use Ignite for operations with financial data,
> and there lots of text stuff like assets names, fin. instruments, companies
> etc.
> In order to operate with this quickly and reliably, users used to work with
> text search, type-ahead completions, suggestions.
>
> For this purposes we are indexing particular string data in separate caches.
>
> Sorting capabilities and response size limitations are very important
> there. As our API have to provide most relevant information in view of
> limited size.
>
> Now let me comment some Ignite/Lucene perspective.
> Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> sorted by *score *(relevance). So most relevant documents are on the top.
> And currently distributed queries responses from different nodes are merged
> into final query cursor queue in arbitrary way.
> So in fact we already have the score order ruined here. Also Ignite
> requests all possible documents from Lucene that is redundant and not good
> for performance.
>
> I'm implementing *limit* parameter to be part of *TextQuery *and have to
> notice that we still have to add sorting for text queries processing in
> order to have applicable results.
>
> *Limit* parameter itself should improve the part of issues from above, but
> definitely, sorting by document score at least  should be implemented along
> with limit.
>
> This is a pretty short commentary if you still have any questions, please
> ask, do not hesitate)
>
> BR,
> Yuriy Shuliha
>
> чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
>
> > Yuriy,
> >
> > Greatly appreciate your interest.
> >
> > Could you please elaborate a little bit about sorting? What tasks does
> > it help to solve and how? It would be great to provide an example.
> >
> > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>:
> > >
> > > Denis,
> > >
> > > I like the idea of throwing an exception for enabled text queries on
> > > persistent caches.
> > >
> > > Also I'm fine with proposed limit for unsorted searches.
> > >
> > > Yury, please proceed with ticket creation.
> > >
> > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > >
> > > > Igniters,
> > > >
> > > > I see nothing wrong with Yury's proposal in regards full-text search
> > API
> > > > evolution as long as Yury is ready to push it forward.
> > > >
> > > > As for the in-memory mode only, it makes total sense for in-memory data
> > > > grid deployments when Ignite caches data of an underlying DB like
> > Postgres.
> > > > As part of the changes, I would simply throw an exception (by default)
> > if
> > > > the one attempts to use text indices with the native persistence
> > enabled.
> > > > If the person is ready to live with that limitation that an explicit
> > > > configuration change is needed to come around the exception.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga 
> > wrote:
> > > >
> > > > > Hello to all again,
> > > > >
> > > > > Thank you for important comments and notes given below!
> > > > >
> > > > > Let me answer and continue the discussion.
> > > > >
> > > > > (I) Overall needs in Lucene indexing
> > > > >
> > > > > Alexei has referenced to
> > > > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > > > absence of index persistence was declared as an obstacle to further
> > > > > development.
> > > > >
> > > > > a) This ticket is already closed as not valid.b) There are definite
> > needs
> > > > > (and in our project as well) in just in-memory indexing of selected
> > data.
> > > > > We intend to use search capabilities for fetching limited amount of
> > > > records
> > > > > that should be used in type-ahead search / suggestions.
> > > > > Not all of the data will be indexed and the are no need in Lucene
> > index
> > > > to
> > > > > be persistence. Hope this is a wide pattern of text-search usage.
> > > > >
> > > > > (II) Necessary fixes in current implementation.
> > > > >
> > > > > a) Implementation of correct *limit *(*offset* seems to be not
> > required
> > > > in
> > > > > text-search tasks for now)
> > > > > I have investigated the data flow for distributed text queries. it
> > was
> > > > > simple 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-25 Thread Yuriy Shuliga
Ivan,

Thank you for interesting question!

Text searches (or full text searches) are mostly human-oriented. And the
point of user's interest is topmost part of response.
Then user can read it, evaluate and use the given records for further
purposes.

Particularly in our case, we use Ignite for operations with financial data,
and there lots of text stuff like assets names, fin. instruments, companies
etc.
In order to operate with this quickly and reliably, users used to work with
text search, type-ahead completions, suggestions.

For this purposes we are indexing particular string data in separate caches.

Sorting capabilities and response size limitations are very important
there. As our API have to provide most relevant information in view of
limited size.

Now let me comment some Ignite/Lucene perspective.
Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
sorted by *score *(relevance). So most relevant documents are on the top.
And currently distributed queries responses from different nodes are merged
into final query cursor queue in arbitrary way.
So in fact we already have the score order ruined here. Also Ignite
requests all possible documents from Lucene that is redundant and not good
for performance.

I'm implementing *limit* parameter to be part of *TextQuery *and have to
notice that we still have to add sorting for text queries processing in
order to have applicable results.

*Limit* parameter itself should improve the part of issues from above, but
definitely, sorting by document score at least  should be implemented along
with limit.

This is a pretty short commentary if you still have any questions, please
ask, do not hesitate)

BR,
Yuriy Shuliha

чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:

> Yuriy,
>
> Greatly appreciate your interest.
>
> Could you please elaborate a little bit about sorting? What tasks does
> it help to solve and how? It would be great to provide an example.
>
> ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>:
> >
> > Denis,
> >
> > I like the idea of throwing an exception for enabled text queries on
> > persistent caches.
> >
> > Also I'm fine with proposed limit for unsorted searches.
> >
> > Yury, please proceed with ticket creation.
> >
> > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> >
> > > Igniters,
> > >
> > > I see nothing wrong with Yury's proposal in regards full-text search
> API
> > > evolution as long as Yury is ready to push it forward.
> > >
> > > As for the in-memory mode only, it makes total sense for in-memory data
> > > grid deployments when Ignite caches data of an underlying DB like
> Postgres.
> > > As part of the changes, I would simply throw an exception (by default)
> if
> > > the one attempts to use text indices with the native persistence
> enabled.
> > > If the person is ready to live with that limitation that an explicit
> > > configuration change is needed to come around the exception.
> > >
> > > Thoughts?
> > >
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga 
> wrote:
> > >
> > > > Hello to all again,
> > > >
> > > > Thank you for important comments and notes given below!
> > > >
> > > > Let me answer and continue the discussion.
> > > >
> > > > (I) Overall needs in Lucene indexing
> > > >
> > > > Alexei has referenced to
> > > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > > absence of index persistence was declared as an obstacle to further
> > > > development.
> > > >
> > > > a) This ticket is already closed as not valid.b) There are definite
> needs
> > > > (and in our project as well) in just in-memory indexing of selected
> data.
> > > > We intend to use search capabilities for fetching limited amount of
> > > records
> > > > that should be used in type-ahead search / suggestions.
> > > > Not all of the data will be indexed and the are no need in Lucene
> index
> > > to
> > > > be persistence. Hope this is a wide pattern of text-search usage.
> > > >
> > > > (II) Necessary fixes in current implementation.
> > > >
> > > > a) Implementation of correct *limit *(*offset* seems to be not
> required
> > > in
> > > > text-search tasks for now)
> > > > I have investigated the data flow for distributed text queries. it
> was
> > > > simple test prefix query, like 'name'*='ene*'*
> > > > For now each server-node returns all response records to the
> client-node
> > > > and it may contain ~thousands, ~hundred thousands records.
> > > > Event if we need only first 10-100. Again, all the results are added
> to
> > > > queue in GridCacheQueryFutureAdapter in arbitrary order by pages.
> > > > I did not find here any means to deliver deterministic result.
> > > > So implementing limit as part of query and (GridCacheQueryRequest)
> will
> > > not
> > > > change the nature of response but will limit load on nodes and
> > > networking.
> > > >
> > > > Can we consider to open a ticket for this?
> > > >
> > > > (III) Further extension of 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-19 Thread Павлухин Иван
Yuriy,

Greatly appreciate your interest.

Could you please elaborate a little bit about sorting? What tasks does
it help to solve and how? It would be great to provide an example.

ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov :
>
> Denis,
>
> I like the idea of throwing an exception for enabled text queries on
> persistent caches.
>
> Also I'm fine with proposed limit for unsorted searches.
>
> Yury, please proceed with ticket creation.
>
> вт, 17 сент. 2019 г., 22:06 Denis Magda :
>
> > Igniters,
> >
> > I see nothing wrong with Yury's proposal in regards full-text search API
> > evolution as long as Yury is ready to push it forward.
> >
> > As for the in-memory mode only, it makes total sense for in-memory data
> > grid deployments when Ignite caches data of an underlying DB like Postgres.
> > As part of the changes, I would simply throw an exception (by default) if
> > the one attempts to use text indices with the native persistence enabled.
> > If the person is ready to live with that limitation that an explicit
> > configuration change is needed to come around the exception.
> >
> > Thoughts?
> >
> >
> > -
> > Denis
> >
> >
> > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga  wrote:
> >
> > > Hello to all again,
> > >
> > > Thank you for important comments and notes given below!
> > >
> > > Let me answer and continue the discussion.
> > >
> > > (I) Overall needs in Lucene indexing
> > >
> > > Alexei has referenced to
> > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > absence of index persistence was declared as an obstacle to further
> > > development.
> > >
> > > a) This ticket is already closed as not valid.b) There are definite needs
> > > (and in our project as well) in just in-memory indexing of selected data.
> > > We intend to use search capabilities for fetching limited amount of
> > records
> > > that should be used in type-ahead search / suggestions.
> > > Not all of the data will be indexed and the are no need in Lucene index
> > to
> > > be persistence. Hope this is a wide pattern of text-search usage.
> > >
> > > (II) Necessary fixes in current implementation.
> > >
> > > a) Implementation of correct *limit *(*offset* seems to be not required
> > in
> > > text-search tasks for now)
> > > I have investigated the data flow for distributed text queries. it was
> > > simple test prefix query, like 'name'*='ene*'*
> > > For now each server-node returns all response records to the client-node
> > > and it may contain ~thousands, ~hundred thousands records.
> > > Event if we need only first 10-100. Again, all the results are added to
> > > queue in GridCacheQueryFutureAdapter in arbitrary order by pages.
> > > I did not find here any means to deliver deterministic result.
> > > So implementing limit as part of query and (GridCacheQueryRequest) will
> > not
> > > change the nature of response but will limit load on nodes and
> > networking.
> > >
> > > Can we consider to open a ticket for this?
> > >
> > > (III) Further extension of Lucene API exposition to Ignite
> > >
> > > a) Sorting
> > > The solution for this could be:
> > > - Make entities comparable
> > > - Add custom comparator to entity
> > > - Add annotations to mark sorted fields for Lucene indexing
> > > - Use comparators when merging responses or reducing to desired limit on
> > > client node.
> > > Will require full result set to be loaded into memory. Though can be used
> > > for relatively small limits.
> > > BR,
> > > Yuriy Shuliha
> > >
> > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>
> > > пише:
> > >
> > > > Yuriy,
> > > >
> > > > Note what one of major blockers for text queries is [1] which makes
> > > lucene
> > > > indexes unusable with persistence and main reason for discontinuation.
> > > > Probably it's should be addressed first to make text queries a valid
> > > > product feature.
> > > >
> > > > Distributed sorting and advanved querying is indeed not a trivial task.
> > > > Some kind of merging must be implemented on query originating node.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-5371
> > > >
> > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda :
> > > >
> > > > > Yuriy,
> > > > >
> > > > > If you are ready to take over the full-text search indexes then
> > please
> > > go
> > > > > ahead. The primary reason why the community wants to discontinue them
> > > > first
> > > > > (and, probable, resurrect later) are the limitations listed by Andrey
> > > and
> > > > > minimal support from the community end.
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov <
> > > > > andrey.mashen...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Yuriy,
> > > > > >
> > > > > > Unfortunatelly, there is a plan to discontinue TextQueries in
> > Ignite
> > > > [1].
> > > > > > Motivation here is text indexes are not persistent, not
> > transactional
> > > > and
> > > > > > can't be user together 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-18 Thread Alexei Scherbakov
Denis,

I like the idea of throwing an exception for enabled text queries on
persistent caches.

Also I'm fine with proposed limit for unsorted searches.

Yury, please proceed with ticket creation.

вт, 17 сент. 2019 г., 22:06 Denis Magda :

> Igniters,
>
> I see nothing wrong with Yury's proposal in regards full-text search API
> evolution as long as Yury is ready to push it forward.
>
> As for the in-memory mode only, it makes total sense for in-memory data
> grid deployments when Ignite caches data of an underlying DB like Postgres.
> As part of the changes, I would simply throw an exception (by default) if
> the one attempts to use text indices with the native persistence enabled.
> If the person is ready to live with that limitation that an explicit
> configuration change is needed to come around the exception.
>
> Thoughts?
>
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga  wrote:
>
> > Hello to all again,
> >
> > Thank you for important comments and notes given below!
> >
> > Let me answer and continue the discussion.
> >
> > (I) Overall needs in Lucene indexing
> >
> > Alexei has referenced to
> > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > absence of index persistence was declared as an obstacle to further
> > development.
> >
> > a) This ticket is already closed as not valid.b) There are definite needs
> > (and in our project as well) in just in-memory indexing of selected data.
> > We intend to use search capabilities for fetching limited amount of
> records
> > that should be used in type-ahead search / suggestions.
> > Not all of the data will be indexed and the are no need in Lucene index
> to
> > be persistence. Hope this is a wide pattern of text-search usage.
> >
> > (II) Necessary fixes in current implementation.
> >
> > a) Implementation of correct *limit *(*offset* seems to be not required
> in
> > text-search tasks for now)
> > I have investigated the data flow for distributed text queries. it was
> > simple test prefix query, like 'name'*='ene*'*
> > For now each server-node returns all response records to the client-node
> > and it may contain ~thousands, ~hundred thousands records.
> > Event if we need only first 10-100. Again, all the results are added to
> > queue in GridCacheQueryFutureAdapter in arbitrary order by pages.
> > I did not find here any means to deliver deterministic result.
> > So implementing limit as part of query and (GridCacheQueryRequest) will
> not
> > change the nature of response but will limit load on nodes and
> networking.
> >
> > Can we consider to open a ticket for this?
> >
> > (III) Further extension of Lucene API exposition to Ignite
> >
> > a) Sorting
> > The solution for this could be:
> > - Make entities comparable
> > - Add custom comparator to entity
> > - Add annotations to mark sorted fields for Lucene indexing
> > - Use comparators when merging responses or reducing to desired limit on
> > client node.
> > Will require full result set to be loaded into memory. Though can be used
> > for relatively small limits.
> > BR,
> > Yuriy Shuliha
> >
> > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
> alexey.scherbak...@gmail.com>
> > пише:
> >
> > > Yuriy,
> > >
> > > Note what one of major blockers for text queries is [1] which makes
> > lucene
> > > indexes unusable with persistence and main reason for discontinuation.
> > > Probably it's should be addressed first to make text queries a valid
> > > product feature.
> > >
> > > Distributed sorting and advanved querying is indeed not a trivial task.
> > > Some kind of merging must be implemented on query originating node.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-5371
> > >
> > > чт, 29 авг. 2019 г. в 23:38, Denis Magda :
> > >
> > > > Yuriy,
> > > >
> > > > If you are ready to take over the full-text search indexes then
> please
> > go
> > > > ahead. The primary reason why the community wants to discontinue them
> > > first
> > > > (and, probable, resurrect later) are the limitations listed by Andrey
> > and
> > > > minimal support from the community end.
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov <
> > > > andrey.mashen...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Yuriy,
> > > > >
> > > > > Unfortunatelly, there is a plan to discontinue TextQueries in
> Ignite
> > > [1].
> > > > > Motivation here is text indexes are not persistent, not
> transactional
> > > and
> > > > > can't be user together with SQL or inside SQL.
> > > > > and there is a lack of interest from community side.
> > > > > You are weclome to take on these issues and make TextQueries great.
> > > > >
> > > > > 1,  PageSize can't be used to limit resultset.
> > > > > Query results return from data node to client-side cursor in
> > > page-by-page
> > > > > manner and
> > > > > this parameter is designed control page size. It is supposed query
> > > > executes
> > > > > lazily on server side and
> > > > > it is not 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-17 Thread Denis Magda
Igniters,

I see nothing wrong with Yury's proposal in regards full-text search API
evolution as long as Yury is ready to push it forward.

As for the in-memory mode only, it makes total sense for in-memory data
grid deployments when Ignite caches data of an underlying DB like Postgres.
As part of the changes, I would simply throw an exception (by default) if
the one attempts to use text indices with the native persistence enabled.
If the person is ready to live with that limitation that an explicit
configuration change is needed to come around the exception.

Thoughts?


-
Denis


On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga  wrote:

> Hello to all again,
>
> Thank you for important comments and notes given below!
>
> Let me answer and continue the discussion.
>
> (I) Overall needs in Lucene indexing
>
> Alexei has referenced to
> https://issues.apache.org/jira/browse/IGNITE-5371 where
> absence of index persistence was declared as an obstacle to further
> development.
>
> a) This ticket is already closed as not valid.b) There are definite needs
> (and in our project as well) in just in-memory indexing of selected data.
> We intend to use search capabilities for fetching limited amount of records
> that should be used in type-ahead search / suggestions.
> Not all of the data will be indexed and the are no need in Lucene index to
> be persistence. Hope this is a wide pattern of text-search usage.
>
> (II) Necessary fixes in current implementation.
>
> a) Implementation of correct *limit *(*offset* seems to be not required in
> text-search tasks for now)
> I have investigated the data flow for distributed text queries. it was
> simple test prefix query, like 'name'*='ene*'*
> For now each server-node returns all response records to the client-node
> and it may contain ~thousands, ~hundred thousands records.
> Event if we need only first 10-100. Again, all the results are added to
> queue in GridCacheQueryFutureAdapter in arbitrary order by pages.
> I did not find here any means to deliver deterministic result.
> So implementing limit as part of query and (GridCacheQueryRequest) will not
> change the nature of response but will limit load on nodes and networking.
>
> Can we consider to open a ticket for this?
>
> (III) Further extension of Lucene API exposition to Ignite
>
> a) Sorting
> The solution for this could be:
> - Make entities comparable
> - Add custom comparator to entity
> - Add annotations to mark sorted fields for Lucene indexing
> - Use comparators when merging responses or reducing to desired limit on
> client node.
> Will require full result set to be loaded into memory. Though can be used
> for relatively small limits.
> BR,
> Yuriy Shuliha
>
> пт, 30 серп. 2019 о 10:37 Alexei Scherbakov 
> пише:
>
> > Yuriy,
> >
> > Note what one of major blockers for text queries is [1] which makes
> lucene
> > indexes unusable with persistence and main reason for discontinuation.
> > Probably it's should be addressed first to make text queries a valid
> > product feature.
> >
> > Distributed sorting and advanved querying is indeed not a trivial task.
> > Some kind of merging must be implemented on query originating node.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5371
> >
> > чт, 29 авг. 2019 г. в 23:38, Denis Magda :
> >
> > > Yuriy,
> > >
> > > If you are ready to take over the full-text search indexes then please
> go
> > > ahead. The primary reason why the community wants to discontinue them
> > first
> > > (and, probable, resurrect later) are the limitations listed by Andrey
> and
> > > minimal support from the community end.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov <
> > > andrey.mashen...@gmail.com>
> > > wrote:
> > >
> > > > Hi Yuriy,
> > > >
> > > > Unfortunatelly, there is a plan to discontinue TextQueries in Ignite
> > [1].
> > > > Motivation here is text indexes are not persistent, not transactional
> > and
> > > > can't be user together with SQL or inside SQL.
> > > > and there is a lack of interest from community side.
> > > > You are weclome to take on these issues and make TextQueries great.
> > > >
> > > > 1,  PageSize can't be used to limit resultset.
> > > > Query results return from data node to client-side cursor in
> > page-by-page
> > > > manner and
> > > > this parameter is designed control page size. It is supposed query
> > > executes
> > > > lazily on server side and
> > > > it is not excepted full resultset be loaded to memory on server side
> at
> > > > once, but by pages.
> > > > Do you mean you found Lucene load entire resultset into memory before
> > > first
> > > > page is sent to client?
> > > >
> > > > I'd think a new parameter should be added to limit result. The best
> > > > solution is to use query language commands for this, e.g.
> > "LIMIT/OFFSET"
> > > in
> > > > SQL.
> > > >
> > > > This task doesn't look trivial. Query is distributed operation and
> same
> > > > user query will be executed on 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-17 Thread Yuriy Shuliga
Hello to all again,

Thank you for important comments and notes given below!

Let me answer and continue the discussion.

(I) Overall needs in Lucene indexing

Alexei has referenced to
https://issues.apache.org/jira/browse/IGNITE-5371 where
absence of index persistence was declared as an obstacle to further
development.

a) This ticket is already closed as not valid.b) There are definite needs
(and in our project as well) in just in-memory indexing of selected data.
We intend to use search capabilities for fetching limited amount of records
that should be used in type-ahead search / suggestions.
Not all of the data will be indexed and the are no need in Lucene index to
be persistence. Hope this is a wide pattern of text-search usage.

(II) Necessary fixes in current implementation.

a) Implementation of correct *limit *(*offset* seems to be not required in
text-search tasks for now)
I have investigated the data flow for distributed text queries. it was
simple test prefix query, like 'name'*='ene*'*
For now each server-node returns all response records to the client-node
and it may contain ~thousands, ~hundred thousands records.
Event if we need only first 10-100. Again, all the results are added to
queue in GridCacheQueryFutureAdapter in arbitrary order by pages.
I did not find here any means to deliver deterministic result.
So implementing limit as part of query and (GridCacheQueryRequest) will not
change the nature of response but will limit load on nodes and networking.

Can we consider to open a ticket for this?

(III) Further extension of Lucene API exposition to Ignite

a) Sorting
The solution for this could be:
- Make entities comparable
- Add custom comparator to entity
- Add annotations to mark sorted fields for Lucene indexing
- Use comparators when merging responses or reducing to desired limit on
client node.
Will require full result set to be loaded into memory. Though can be used
for relatively small limits.
BR,
Yuriy Shuliha

пт, 30 серп. 2019 о 10:37 Alexei Scherbakov 
пише:

> Yuriy,
>
> Note what one of major blockers for text queries is [1] which makes lucene
> indexes unusable with persistence and main reason for discontinuation.
> Probably it's should be addressed first to make text queries a valid
> product feature.
>
> Distributed sorting and advanved querying is indeed not a trivial task.
> Some kind of merging must be implemented on query originating node.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5371
>
> чт, 29 авг. 2019 г. в 23:38, Denis Magda :
>
> > Yuriy,
> >
> > If you are ready to take over the full-text search indexes then please go
> > ahead. The primary reason why the community wants to discontinue them
> first
> > (and, probable, resurrect later) are the limitations listed by Andrey and
> > minimal support from the community end.
> >
> > -
> > Denis
> >
> >
> > On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov <
> > andrey.mashen...@gmail.com>
> > wrote:
> >
> > > Hi Yuriy,
> > >
> > > Unfortunatelly, there is a plan to discontinue TextQueries in Ignite
> [1].
> > > Motivation here is text indexes are not persistent, not transactional
> and
> > > can't be user together with SQL or inside SQL.
> > > and there is a lack of interest from community side.
> > > You are weclome to take on these issues and make TextQueries great.
> > >
> > > 1,  PageSize can't be used to limit resultset.
> > > Query results return from data node to client-side cursor in
> page-by-page
> > > manner and
> > > this parameter is designed control page size. It is supposed query
> > executes
> > > lazily on server side and
> > > it is not excepted full resultset be loaded to memory on server side at
> > > once, but by pages.
> > > Do you mean you found Lucene load entire resultset into memory before
> > first
> > > page is sent to client?
> > >
> > > I'd think a new parameter should be added to limit result. The best
> > > solution is to use query language commands for this, e.g.
> "LIMIT/OFFSET"
> > in
> > > SQL.
> > >
> > > This task doesn't look trivial. Query is distributed operation and same
> > > user query will be executed on data nodes
> > > and then results from all nodes should be correcly merged before being
> > > returned via client-cursor.
> > > So, LIMIT should be applied on every node and then on merge phase.
> > >
> > > Also, this may be non-obviuos, limiting results make no sence without
> > > sorting,
> > > as there is no guarantee every next query run will return same data
> > because
> > > of page reordeing.
> > > Basically, merge phase receive results from data nodes asynchronously
> and
> > > messages from different nodes can't be ordered.
> > >
> > > 2.
> > > a. "tokenize" param name (for @QueryTextFiled) looks more verbose,
> isn't
> > > it.
> > > b,c. What about distributed query? How partial results from nodes will
> be
> > > merged?
> > >  Does Lucene allows to configure comparator for data sorting?
> > > What comparator Ignite should choose to sort result on 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-08-30 Thread Alexei Scherbakov
Yuriy,

Note what one of major blockers for text queries is [1] which makes lucene
indexes unusable with persistence and main reason for discontinuation.
Probably it's should be addressed first to make text queries a valid
product feature.

Distributed sorting and advanved querying is indeed not a trivial task.
Some kind of merging must be implemented on query originating node.

[1] https://issues.apache.org/jira/browse/IGNITE-5371

чт, 29 авг. 2019 г. в 23:38, Denis Magda :

> Yuriy,
>
> If you are ready to take over the full-text search indexes then please go
> ahead. The primary reason why the community wants to discontinue them first
> (and, probable, resurrect later) are the limitations listed by Andrey and
> minimal support from the community end.
>
> -
> Denis
>
>
> On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov <
> andrey.mashen...@gmail.com>
> wrote:
>
> > Hi Yuriy,
> >
> > Unfortunatelly, there is a plan to discontinue TextQueries in Ignite [1].
> > Motivation here is text indexes are not persistent, not transactional and
> > can't be user together with SQL or inside SQL.
> > and there is a lack of interest from community side.
> > You are weclome to take on these issues and make TextQueries great.
> >
> > 1,  PageSize can't be used to limit resultset.
> > Query results return from data node to client-side cursor in page-by-page
> > manner and
> > this parameter is designed control page size. It is supposed query
> executes
> > lazily on server side and
> > it is not excepted full resultset be loaded to memory on server side at
> > once, but by pages.
> > Do you mean you found Lucene load entire resultset into memory before
> first
> > page is sent to client?
> >
> > I'd think a new parameter should be added to limit result. The best
> > solution is to use query language commands for this, e.g. "LIMIT/OFFSET"
> in
> > SQL.
> >
> > This task doesn't look trivial. Query is distributed operation and same
> > user query will be executed on data nodes
> > and then results from all nodes should be correcly merged before being
> > returned via client-cursor.
> > So, LIMIT should be applied on every node and then on merge phase.
> >
> > Also, this may be non-obviuos, limiting results make no sence without
> > sorting,
> > as there is no guarantee every next query run will return same data
> because
> > of page reordeing.
> > Basically, merge phase receive results from data nodes asynchronously and
> > messages from different nodes can't be ordered.
> >
> > 2.
> > a. "tokenize" param name (for @QueryTextFiled) looks more verbose, isn't
> > it.
> > b,c. What about distributed query? How partial results from nodes will be
> > merged?
> >  Does Lucene allows to configure comparator for data sorting?
> > What comparator Ignite should choose to sort result on merge phase?
> >
> > 3. For now Lucene engine is not configurable at all. E.g. it is
> impossible
> > to configure Tokenizer.
> > I'd think about possible ways to configure engine at first and only then
> go
> > further to discuss\implement complex features,
> > that may depends on engine config.
> >
> >
> >
> > On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga  wrote:
> >
> > > Dear community,
> > >
> > > By starting this chain I'd like to open discussion that would come to
> > > contribution results in subj. area.
> > >
> > > Ignite has indexing capabilities, backed up by different mechanisms,
> > > including Lucene.
> > >
> > > Currently, Lucene 7.5.0 is used (past year release).
> > > This is a wide spread and mature technology that covers text search
> area
> > > and beyond (e.g. spacial data indexing).
> > >
> > > My goal is to *expose more Lucene functionality to Ignite indexing and
> > > query mechanisms for text data*.
> > >
> > > It's quite simple request at current stage. It is coming from our
> > project's
> > > needs, but i believe, will be useful for a lot more people.
> > > Let's walk through and vote or discuss about Jira tickets for them.
> > >
> > > 1.[trivial] Use  dataQuery.getPageSize()  to limit search response
> items
> > > inside GridLuceneIndex.query(). Currently it is calling
> > > IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all
> > scored
> > > matches will me returned, what we do not need in most cases.
> > >
> > > 2.[simple] Add sorting.  Then more capable search call can be
> > > executed: *IndexSearcher.search(query, count,
> > > sort) *
> > > Implementation steps:
> > > a) Introduce boolean *sortField* parameter in *@QueryTextFiled *
> > > annotation. If
> > > *true *the filed will be indexed but not tokenized. Number types are
> > > preferred here.
> > > b) Add *sort* collection to *TextQuery* constructor. It should define
> > > desired sort fields used for querying.
> > > c) Implement Lucene sort usage in GridLuceneIndex.query().
> > >
> > > 3.[moderate] Build complex queries with *TextQuery*, including
> > > terms/queries boosting.
> > > *This section for voting only, as requires more detailed work. 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-08-29 Thread Denis Magda
Yuriy,

If you are ready to take over the full-text search indexes then please go
ahead. The primary reason why the community wants to discontinue them first
(and, probable, resurrect later) are the limitations listed by Andrey and
minimal support from the community end.

-
Denis


On Thu, Aug 29, 2019 at 1:29 PM Andrey Mashenkov 
wrote:

> Hi Yuriy,
>
> Unfortunatelly, there is a plan to discontinue TextQueries in Ignite [1].
> Motivation here is text indexes are not persistent, not transactional and
> can't be user together with SQL or inside SQL.
> and there is a lack of interest from community side.
> You are weclome to take on these issues and make TextQueries great.
>
> 1,  PageSize can't be used to limit resultset.
> Query results return from data node to client-side cursor in page-by-page
> manner and
> this parameter is designed control page size. It is supposed query executes
> lazily on server side and
> it is not excepted full resultset be loaded to memory on server side at
> once, but by pages.
> Do you mean you found Lucene load entire resultset into memory before first
> page is sent to client?
>
> I'd think a new parameter should be added to limit result. The best
> solution is to use query language commands for this, e.g. "LIMIT/OFFSET" in
> SQL.
>
> This task doesn't look trivial. Query is distributed operation and same
> user query will be executed on data nodes
> and then results from all nodes should be correcly merged before being
> returned via client-cursor.
> So, LIMIT should be applied on every node and then on merge phase.
>
> Also, this may be non-obviuos, limiting results make no sence without
> sorting,
> as there is no guarantee every next query run will return same data because
> of page reordeing.
> Basically, merge phase receive results from data nodes asynchronously and
> messages from different nodes can't be ordered.
>
> 2.
> a. "tokenize" param name (for @QueryTextFiled) looks more verbose, isn't
> it.
> b,c. What about distributed query? How partial results from nodes will be
> merged?
>  Does Lucene allows to configure comparator for data sorting?
> What comparator Ignite should choose to sort result on merge phase?
>
> 3. For now Lucene engine is not configurable at all. E.g. it is impossible
> to configure Tokenizer.
> I'd think about possible ways to configure engine at first and only then go
> further to discuss\implement complex features,
> that may depends on engine config.
>
>
>
> On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga  wrote:
>
> > Dear community,
> >
> > By starting this chain I'd like to open discussion that would come to
> > contribution results in subj. area.
> >
> > Ignite has indexing capabilities, backed up by different mechanisms,
> > including Lucene.
> >
> > Currently, Lucene 7.5.0 is used (past year release).
> > This is a wide spread and mature technology that covers text search area
> > and beyond (e.g. spacial data indexing).
> >
> > My goal is to *expose more Lucene functionality to Ignite indexing and
> > query mechanisms for text data*.
> >
> > It's quite simple request at current stage. It is coming from our
> project's
> > needs, but i believe, will be useful for a lot more people.
> > Let's walk through and vote or discuss about Jira tickets for them.
> >
> > 1.[trivial] Use  dataQuery.getPageSize()  to limit search response items
> > inside GridLuceneIndex.query(). Currently it is calling
> > IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all
> scored
> > matches will me returned, what we do not need in most cases.
> >
> > 2.[simple] Add sorting.  Then more capable search call can be
> > executed: *IndexSearcher.search(query, count,
> > sort) *
> > Implementation steps:
> > a) Introduce boolean *sortField* parameter in *@QueryTextFiled *
> > annotation. If
> > *true *the filed will be indexed but not tokenized. Number types are
> > preferred here.
> > b) Add *sort* collection to *TextQuery* constructor. It should define
> > desired sort fields used for querying.
> > c) Implement Lucene sort usage in GridLuceneIndex.query().
> >
> > 3.[moderate] Build complex queries with *TextQuery*, including
> > terms/queries boosting.
> > *This section for voting only, as requires more detailed work. Should be
> > extended if community is interested in it.*
> >
> > Looking forward to your comments!
> >
> > BR,
> > Yuriy Shuliha
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>


Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-08-29 Thread Andrey Mashenkov
Hi Yuriy,

Unfortunatelly, there is a plan to discontinue TextQueries in Ignite [1].
Motivation here is text indexes are not persistent, not transactional and
can't be user together with SQL or inside SQL.
and there is a lack of interest from community side.
You are weclome to take on these issues and make TextQueries great.

1,  PageSize can't be used to limit resultset.
Query results return from data node to client-side cursor in page-by-page
manner and
this parameter is designed control page size. It is supposed query executes
lazily on server side and
it is not excepted full resultset be loaded to memory on server side at
once, but by pages.
Do you mean you found Lucene load entire resultset into memory before first
page is sent to client?

I'd think a new parameter should be added to limit result. The best
solution is to use query language commands for this, e.g. "LIMIT/OFFSET" in
SQL.

This task doesn't look trivial. Query is distributed operation and same
user query will be executed on data nodes
and then results from all nodes should be correcly merged before being
returned via client-cursor.
So, LIMIT should be applied on every node and then on merge phase.

Also, this may be non-obviuos, limiting results make no sence without
sorting,
as there is no guarantee every next query run will return same data because
of page reordeing.
Basically, merge phase receive results from data nodes asynchronously and
messages from different nodes can't be ordered.

2.
a. "tokenize" param name (for @QueryTextFiled) looks more verbose, isn't
it.
b,c. What about distributed query? How partial results from nodes will be
merged?
 Does Lucene allows to configure comparator for data sorting?
What comparator Ignite should choose to sort result on merge phase?

3. For now Lucene engine is not configurable at all. E.g. it is impossible
to configure Tokenizer.
I'd think about possible ways to configure engine at first and only then go
further to discuss\implement complex features,
that may depends on engine config.



On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga  wrote:

> Dear community,
>
> By starting this chain I'd like to open discussion that would come to
> contribution results in subj. area.
>
> Ignite has indexing capabilities, backed up by different mechanisms,
> including Lucene.
>
> Currently, Lucene 7.5.0 is used (past year release).
> This is a wide spread and mature technology that covers text search area
> and beyond (e.g. spacial data indexing).
>
> My goal is to *expose more Lucene functionality to Ignite indexing and
> query mechanisms for text data*.
>
> It's quite simple request at current stage. It is coming from our project's
> needs, but i believe, will be useful for a lot more people.
> Let's walk through and vote or discuss about Jira tickets for them.
>
> 1.[trivial] Use  dataQuery.getPageSize()  to limit search response items
> inside GridLuceneIndex.query(). Currently it is calling
> IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all scored
> matches will me returned, what we do not need in most cases.
>
> 2.[simple] Add sorting.  Then more capable search call can be
> executed: *IndexSearcher.search(query, count,
> sort) *
> Implementation steps:
> a) Introduce boolean *sortField* parameter in *@QueryTextFiled *
> annotation. If
> *true *the filed will be indexed but not tokenized. Number types are
> preferred here.
> b) Add *sort* collection to *TextQuery* constructor. It should define
> desired sort fields used for querying.
> c) Implement Lucene sort usage in GridLuceneIndex.query().
>
> 3.[moderate] Build complex queries with *TextQuery*, including
> terms/queries boosting.
> *This section for voting only, as requires more detailed work. Should be
> extended if community is interested in it.*
>
> Looking forward to your comments!
>
> BR,
> Yuriy Shuliha
>


-- 
Best regards,
Andrey V. Mashenkov


Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-08-29 Thread Yuriy Shuliga
Dear community,

By starting this chain I'd like to open discussion that would come to
contribution results in subj. area.

Ignite has indexing capabilities, backed up by different mechanisms,
including Lucene.

Currently, Lucene 7.5.0 is used (past year release).
This is a wide spread and mature technology that covers text search area
and beyond (e.g. spacial data indexing).

My goal is to *expose more Lucene functionality to Ignite indexing and
query mechanisms for text data*.

It's quite simple request at current stage. It is coming from our project's
needs, but i believe, will be useful for a lot more people.
Let's walk through and vote or discuss about Jira tickets for them.

1.[trivial] Use  dataQuery.getPageSize()  to limit search response items
inside GridLuceneIndex.query(). Currently it is calling
IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all scored
matches will me returned, what we do not need in most cases.

2.[simple] Add sorting.  Then more capable search call can be
executed: *IndexSearcher.search(query, count,
sort) *
Implementation steps:
a) Introduce boolean *sortField* parameter in *@QueryTextFiled * annotation. If
*true *the filed will be indexed but not tokenized. Number types are
preferred here.
b) Add *sort* collection to *TextQuery* constructor. It should define
desired sort fields used for querying.
c) Implement Lucene sort usage in GridLuceneIndex.query().

3.[moderate] Build complex queries with *TextQuery*, including
terms/queries boosting.
*This section for voting only, as requires more detailed work. Should be
extended if community is interested in it.*

Looking forward to your comments!

BR,
Yuriy Shuliha