Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-03-13 Thread Yuriy Shuliga
Ivan,

I have made changes in the fork that reflects merge-sort strategy and now
query future iterator unblocks as soon all first pages are delivered from
nodes; then it waits for the next pages portions and so on.
https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd

Please validate the design if you wish.

Regarding ranking field in the entity.

Entities for text queries in search domain are usually treated as
documents with some metadata.
This can be an id, issued/expired date, and document score returned for
given query.
It is common to include such fields in entity design.

Answer to your question about omitting QueryRankField:
- Then the response records just will come in arbitrary order. This
should not fail TextQuery execution.

Another point about rank value among different indices.
- ranks are to be used for comparison between entities in praticular query
response, they are not intended to be absolute over the system.

Let me summarize the approaches:
1. Subclassing from Ranked.class.
 pros: the simplest and ignite-natural approach
cons: implicit nature, limits entity inheritance

2. Explicitly Introducing dedicated field  annotated  @QueryRankField
pros:  ignite-natural approach, easy to introduce, explicitly controlled by
developer
cons: adds extra metadata to entity

3. Wrapping entity response with rank data, used for merge sort, not
exposing it to client.
pros: leaves entity design clean
cons: rank is not available for client, development will require complex
change in query execution / entity marshaling mechanisms

I'd stay on p.2 as most balanced solution of these.
What do you think?

BR,
Yuriy Shuliha




ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin  пише:

> Igniters,
>
> Not intentionally the discussion continued outside of dev list. I am
> returning it back. You can find it below. Do not hesitate to join if you
> have some thoughts on raised questions. May be you have ideas how to enrich
> text query results with score/rank information.
>
> вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga :
>
> > Yes, please do.
> >
> > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin 
> > пише:
> >
> >> Yuriy,
> >>
> >> I noticed that from some point our discussion moved out of Ignite dev
> >> list. Would you mind if I return it back to dev list?
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >>
> >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin :
> >> >
> >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> >> What will be the next version/date we can aim on with this update?
> >> >
> >> > Yes, 2.8 is already available and the community is working on
> >> finalizing activities (e.g. publishing documentation). I do not have any
> >> reliable expectations about next releases. I suppose that there could
> be a
> >> couple of maintenance releases like 2.8.1 as several problems were
> already
> >> discovered. I do not know whether next more significant release is
> going to
> >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> >> because there are already several "almost ready" features in master. In
> my
> >> mind it is a good idea to start a discussion about next releases on dev
> >> list.
> >> >
> >> > Best regards,
> >> > Ivan Pavlukhin
> >> >
> >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin :
> >> > >
> >> > > Hi Yuriy,
> >> > >
> >> > > Sorry for a late response.
> >> > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> >> to initiating node
> >> > > > 4. Possibly still need to proxify entity with adding Comparable
> >> interface.
> >> > > > 5. Perform merge sort on initiating node
> >> > >
> >> > > Possibly I missed it but one moment is not clear for me. What will
> >> > > happen if an entity class does not have a field annotated with
> >> > > QueryRankField?
> >> > >
> >> > > And I am still not sure that it is a proper (enough) approach. The
> >> > > thing which bothers me is a transient and dynamic nature of "rank"
> >> > > field. It does belong to entity, it can have different values for
> the
> >> > > same entity (e.g. different indice

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2020-01-23 Thread Yuriy Shuliga
Hi Ivan,

Actually I have engaged another developer to help bring TextQueries to
correctly working state.
For now we have solution that adds Ordering functionality to distributed
TextQueries .
This is developed and tested locally. I can share details here, then we can
discuss and decide whether to create a corresponding ticket.

The starting point is that by nature Lucene's documents are always ordered
by docScore:float;
So we created abstract class Ranked, implementing Comparable and
Serializable; and containing float rank value;

Each entity expected to be ordered on TextQuery merge should be
derived from this class.
All subsequent actions will be done under the hood automatically due
to new CacheQueryFutureRankedDecorator

that contain special BlockingIterator used for correct merge of distributed
responses.
Text queries with Ranked entities are automatically wrapped with this new
decorator.

This is a contour of solution. Please ask if any questions.
Or i can create ticket and link PR with already tested (yet locally)
solution to it for detailed review.

BR,
Yuriy


вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin  пише:

> Hi Yuriy,
>
> Just would like to realize current state. Are you still working on
> Ignite text queries? If not, are you going to continue with it?
>
> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin :
> >
> > Yuriy,
> >
> > Sure, I will be glad to help.
> >
> > > - incorrect nodes/partition selection during querying?
> > Apparently this is the problem. If you feel it really complicated to
> > understand and debug then I can dig deeper and share my vision how the
> > problem can be fixed.
> >
> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga :
> > >
> > > I will look to the MOVING partition issue.
> > > But also need a guidance there.
> > >
> > > Ivan, don't you mind to be that person?
> > >
> > > The question is whether we have an issue with:
> > > -  wrong storing targets during indexing OR
> > > - incorrect nodes/partition selection during querying?
> > >
> > > BR,
> > > Yuriy Shluiha
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-11 Thread Yuriy Shuliga
I will look to the MOVING partition issue.
But also need a guidance there. 

Ivan, don't you mind to be that person?

The question is whether we have an issue with:
-  wrong storing targets during indexing OR 
- incorrect nodes/partition selection during querying?

BR,
Yuriy Shluiha



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-28 Thread Yuriy Shuliga
Nice to hear, Ivan

It's good practice to make existing functionality extension to be proper
presented; as we expect if from Text Queries.
Lets make it work correctly at first.

I'm ok to prepare ticket for adding reduction for sorted responses to
GridCacheDistributedQueryFuture  or nearby.
Also theTextQuery response entity will be extended to carry Lucene's
'docScore' per record.
No open question has left then.

BR,
Yuriy Shuliha

чт, 28 лист. 2019 о 15:23 Ivan Pavlukhin  пише:

> Folks, Yuriy,
>
> I suppose that we are going to proceed with
>
> >>>
> Reducing on Ignite
>
> The obvious point of distributed response reduction is class
> GridCacheDistributedQueryFuture.
> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> ReduceIndexSorted
> What I see here, that it is tangled with H2 related classes
> (org.h2.result.Row) and might not be unified with TextQuery reduction.
> >>
>
> From my side there is no strict opinion that we should unify
> reduction. Having a separate reduction implementation for text queries
> sounds for me as not bad option as well.
>
> Are there still any open questions?
>
> ср, 27 нояб. 2019 г. в 02:27, Denis Magda :
> >
> > I don't see anything wrong if Yuriy is willing to carry on and keep
> > enhancing our full-text search support that lacks basic capabilities.
> >
> > The basics should be available. If anybody needs an advanced feature they
> > can introduce Solr or ElastiSearch into the final architecture of the
> app.
> >
> > Folks, who of us can help Yuriy with the questions asked? Most like the
> SQL
> > experts are the best candidates here.
> >
> >
> > -
> > Denis
> >
> >
> > On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin 
> wrote:
> >
> > > Folks,
> > >
> > > IEP is an Ignite-specific thing. In fact, I suppose that we are
> > > already doing it in ASF way by having this dev-list discussion =)
> > >
> > > As for me, implementing "limit" feature for text queries is not so big
> > > to make an IEP. But we might need to create one for next features.
> > >
> > > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>:
> > > >
> > > > Hello!
> > > >
> > > > ASF way should probably start with an IEP :)
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
> > >  > > > >:
> > > >
> > > > >
> > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > > > functionality is helpful and PR it, why not ?
> > > > >
> > > > > isn`t it ?
> > > > >
> > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > > > ilya.kasnach...@gmail.com>:
> > > > > >
> > > > > >Hello!
> > > > > >
> > > > > >The problem here is that Solr is a multi-year effort by a lot of
> > > people.
> > > > > We
> > > > > >can't match that.
> > > > > >
> > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding
> our
> > > > > cache
> > > > > >information into their storage for indexing and relying on their
> own
> > > > > >mechanisms for distributed IR sorting?
> > > > > >
> > > > > >Regards,
> > > > > >--
> > > > > >Ilya Kasnacheev
> > > > > >
> > > > > >
> > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > > > arzamas...@mail.ru.invalid
> > > > > >>:
> > > > > >
> > > > > >>
> > > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite
> functionality ?
> > > > > >>
> > > > > >> thanks !
> > > > > >>
> > > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > > > >>  ilya.kasnach...@gmail.com >:
> > > > > >> >
> > > > > >> >Hello!
> > > > > >> >
> > > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> > > Cloud)
> > > > > >> into
> > >

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-11-22 Thread Yuriy Shuliga
Dear Igniters,

The first part of TextQuery improvement - a result limit - was developed
and merged.
Now we have to develop most important functionality here - proper sorting
of Lucene index response and correct reducing of them for distributed
queries.

*There are two Lucene based aspects*

1. In case of using no sorting fields, the documents in response are still
ordered by relevance.
Actually this is ScoreDoc.score value.
In order to reduce the distributed results correctly, the score should be
passed with response.

2. When sorting by conventional fields, then Lucene should have these
fields properly indexed and
corresponding  Sort object should be applied to Lucene's search call.
In order to mark those fields a new annotation like '@SortField' may be
introduced.

*Reducing on Ignite *

The obvious point of distributed response reduction is class
GridCacheDistributedQueryFuture.
Though, @Ivan Pavlukhin mentioned class with similar functionality:
ReduceIndexSorted
What I see here, that it is tangled with H2 related classes (
org.h2.result.Row) and might not be unified with TextQuery reduction.

Still need a support here.

Overall, the goal of this letter is to initiate discussion on TextQuery
Sorting implementation and come closer to ticket creation.

BR,
Yuriy Shuliha

вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov 
пише:

> Hi Dmitry, Yuriy.
>
> I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
> 'total' field and 'limit; field as primitive int.
>
> Both fields are used inside synchronized block only.
> So, we can make both private and downgrade AtomicInteger to primitive int.
>
> Most likely, these fields can be replaced with one field.
>
>
>
> On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov 
> wrote:
>
> > Hi Andrey,
> >
> > I've checked this ticket comments, and there is a TC Bot visa (with no
> > blockers).
> >
> > Do you have any concerns related to this patch?
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga :
> >
> >>   Andrey,
> >>
> >> Per you request, I created ticket
> >> https://issues.apache.org/jira/browse/IGNITE-12291   linked to
> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >>
> >> Could you please proceed with PR merge ?
> >>
> >> BR,
> >> Yuriy Shuliha
> >>
> >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov 
> >> пише:
> >>
> >> > Hi Yuri,
> >> >
> >> > To get access to TC Bot you should register as TeamCity user [1], if
> you
> >> > didn't do this already.
> >> > Then you will be able to authorize on Ignite TC Bot page with same
> >> > credentials.
> >> >
> >> > [1] https://ci.ignite.apache.org/registerUser.html
> >> >
> >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga 
> wrote:
> >> >
> >> >> Andrew,
> >> >>
> >> >> I have corrected PR according to your notes. Please review.
> >> >> What will be the next steps in order to merge in?
> >> >>
> >> >> Y.
> >> >>
> >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> andrey.mashen...@gmail.com>
> >> >> пише:
> >> >>
> >> >> > Yuri,
> >> >> >
> >> >> > I've done with review.
> >> >> > No crime found, but trivial compatibility bug.
> >> >> >
> >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga 
> >> wrote:
> >> >> >
> >> >> > > Denis,
> >> >> > >
> >> >> > > Thank you for your attention to this.
> >> >> > > as for now, the
> https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > ticket
> >> >> > > is still pending review.
> >> >> > > Do we have a chance to move it forward somehow?
> >> >> > >
> >> >> > > BR,
> >> >> > > Yuriy Shuliha
> >> >> > >
> >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> >> >> > >
> >> >> > > > Yuriy,
> >> >> > > >
> >> >> > > > I've seen you opening a pull-request with the first changes:
> >> >> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > > >
> >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
> review?

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-17 Thread Yuriy Shuliga
  Andrey,

Per you request, I created ticket
https://issues.apache.org/jira/browse/IGNITE-12291   linked to
https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189

Could you please proceed with PR merge ?

BR,
Yuriy Shuliha

ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov  пише:

> Hi Yuri,
>
> To get access to TC Bot you should register as TeamCity user [1], if you
> didn't do this already.
> Then you will be able to authorize on Ignite TC Bot page with same
> credentials.
>
> [1] https://ci.ignite.apache.org/registerUser.html
>
> On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga  wrote:
>
>> Andrew,
>>
>> I have corrected PR according to your notes. Please review.
>> What will be the next steps in order to merge in?
>>
>> Y.
>>
>> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov 
>> пише:
>>
>> > Yuri,
>> >
>> > I've done with review.
>> > No crime found, but trivial compatibility bug.
>> >
>> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
>> >
>> > > Denis,
>> > >
>> > > Thank you for your attention to this.
>> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
>> > ticket
>> > > is still pending review.
>> > > Do we have a chance to move it forward somehow?
>> > >
>> > > BR,
>> > > Yuriy Shuliha
>> > >
>> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
>> > >
>> > > > Yuriy,
>> > > >
>> > > > I've seen you opening a pull-request with the first changes:
>> > > > https://issues.apache.org/jira/browse/IGNITE-12189
>> > > >
>> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
>> > > >
>> > > > -
>> > > > Denis
>> > > >
>> > > >
>> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
>> > > wrote:
>> > > >
>> > > > > Yuriy,
>> > > > >
>> > > > > Thank you for providing details! Quite interesting.
>> > > > >
>> > > > > Yes, we already have support of distributed limit and merging
>> sorted
>> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
>> > > > > MergeStreamIterator are used for merging sorted streams.
>> > > > >
>> > > > > Could you please also clarify about score/relevance? Is it
>> provided
>> > by
>> > > > > Lucene engine for each query result? I am thinking how to do
>> sorted
>> > > > > merge properly in this case.
>> > > > >
>> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
>> > > > > >
>> > > > > > Ivan,
>> > > > > >
>> > > > > > Thank you for interesting question!
>> > > > > >
>> > > > > > Text searches (or full text searches) are mostly human-oriented.
>> > And
>> > > > the
>> > > > > > point of user's interest is topmost part of response.
>> > > > > > Then user can read it, evaluate and use the given records for
>> > further
>> > > > > > purposes.
>> > > > > >
>> > > > > > Particularly in our case, we use Ignite for operations with
>> > financial
>> > > > > data,
>> > > > > > and there lots of text stuff like assets names, fin.
>> instruments,
>> > > > > companies
>> > > > > > etc.
>> > > > > > In order to operate with this quickly and reliably, users used
>> to
>> > > work
>> > > > > with
>> > > > > > text search, type-ahead completions, suggestions.
>> > > > > >
>> > > > > > For this purposes we are indexing particular string data in
>> > separate
>> > > > > caches.
>> > > > > >
>> > > > > > Sorting capabilities and response size limitations are very
>> > important
>> > > > > > there. As our API have to provide most relevant information in
>> view
>> > > of
>> > > > > > limited size.
>> > > > > >
>> > > > > > Now let me comment some Ignite/Lucene perspective.
>> > > > >

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Yuriy Shuliga
Andrew,

I have corrected PR according to your notes. Please review.
What will be the next steps in order to merge in?

Y.

чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov  пише:

> Yuri,
>
> I've done with review.
> No crime found, but trivial compatibility bug.
>
> On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
>
> > Denis,
> >
> > Thank you for your attention to this.
> > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> ticket
> > is still pending review.
> > Do we have a chance to move it forward somehow?
> >
> > BR,
> > Yuriy Shuliha
> >
> > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> >
> > > Yuriy,
> > >
> > > I've seen you opening a pull-request with the first changes:
> > > https://issues.apache.org/jira/browse/IGNITE-12189
> > >
> > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > wrote:
> > >
> > > > Yuriy,
> > > >
> > > > Thank you for providing details! Quite interesting.
> > > >
> > > > Yes, we already have support of distributed limit and merging sorted
> > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > MergeStreamIterator are used for merging sorted streams.
> > > >
> > > > Could you please also clarify about score/relevance? Is it provided
> by
> > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > merge properly in this case.
> > > >
> > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > >
> > > > > Ivan,
> > > > >
> > > > > Thank you for interesting question!
> > > > >
> > > > > Text searches (or full text searches) are mostly human-oriented.
> And
> > > the
> > > > > point of user's interest is topmost part of response.
> > > > > Then user can read it, evaluate and use the given records for
> further
> > > > > purposes.
> > > > >
> > > > > Particularly in our case, we use Ignite for operations with
> financial
> > > > data,
> > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > companies
> > > > > etc.
> > > > > In order to operate with this quickly and reliably, users used to
> > work
> > > > with
> > > > > text search, type-ahead completions, suggestions.
> > > > >
> > > > > For this purposes we are indexing particular string data in
> separate
> > > > caches.
> > > > >
> > > > > Sorting capabilities and response size limitations are very
> important
> > > > > there. As our API have to provide most relevant information in view
> > of
> > > > > limited size.
> > > > >
> > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > *already
> > > > > sorted by *score *(relevance). So most relevant documents are on
> the
> > > top.
> > > > > And currently distributed queries responses from different nodes
> are
> > > > merged
> > > > > into final query cursor queue in arbitrary way.
> > > > > So in fact we already have the score order ruined here. Also Ignite
> > > > > requests all possible documents from Lucene that is redundant and
> not
> > > > good
> > > > > for performance.
> > > > >
> > > > > I'm implementing *limit* parameter to be part of *TextQuery *and
> have
> > > to
> > > > > notice that we still have to add sorting for text queries
> processing
> > in
> > > > > order to have applicable results.
> > > > >
> > > > > *Limit* parameter itself should improve the part of issues from
> > above,
> > > > but
> > > > > definitely, sorting by document score at least  should be
> implemented
> > > > along
> > > > > with limit.
> > > > >
> > > > > This is a pretty short commentary if you still have any questions,
> > > please
> > > > > ask, do not hesitate)
> > > > >
> > > > > BR,
> > 

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-04 Thread Yuriy Shuliga
Ivan,

Yes, your observation is correct.

This behavior lasts from the very beginning when Lucene indexing was
implemented for distributed queries.
Implementation of the *limit* solves the problem of redundant response
size. Without it *ALL* off the records are fetched each time; that is not
good, especially for loose patterns.
In order to solve relevance issue correct sorting should be implemented.

Y.

пт, 4 жовт. 2019 о 10:45 Ivan Pavlukhin  пише:

> Yuriy,
>
> Am I getting it right that in your PR if we have a limit N than
> returned items (at most N) will not be strictly the most relevant
> ones? E.g. if one node returned N items faster than others but with
> not so good relevance?
>
> чт, 3 окт. 2019 г. в 17:47, Andrey Mashenkov :
> >
> > Yuri,
> >
> > I've done with review.
> > No crime found, but trivial compatibility bug.
> >
> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga  wrote:
> >
> > > Denis,
> > >
> > > Thank you for your attention to this.
> > > as for now, the https://issues.apache.org/jira/browse/IGNITE-12189
> ticket
> > > is still pending review.
> > > Do we have a chance to move it forward somehow?
> > >
> > > BR,
> > > Yuriy Shuliha
> > >
> > > пн, 30 вер. 2019 о 23:35 Denis Magda  пише:
> > >
> > > > Yuriy,
> > > >
> > > > I've seen you opening a pull-request with the first changes:
> > > > https://issues.apache.org/jira/browse/IGNITE-12189
> > > >
> > > > Alex Scherbakov and Ivan are you the right guys to do the review?
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван 
> > > wrote:
> > > >
> > > > > Yuriy,
> > > > >
> > > > > Thank you for providing details! Quite interesting.
> > > > >
> > > > > Yes, we already have support of distributed limit and merging
> sorted
> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> > > > > MergeStreamIterator are used for merging sorted streams.
> > > > >
> > > > > Could you please also clarify about score/relevance? Is it
> provided by
> > > > > Lucene engine for each query result? I am thinking how to do sorted
> > > > > merge properly in this case.
> > > > >
> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > > > > >
> > > > > > Ivan,
> > > > > >
> > > > > > Thank you for interesting question!
> > > > > >
> > > > > > Text searches (or full text searches) are mostly human-oriented.
> And
> > > > the
> > > > > > point of user's interest is topmost part of response.
> > > > > > Then user can read it, evaluate and use the given records for
> further
> > > > > > purposes.
> > > > > >
> > > > > > Particularly in our case, we use Ignite for operations with
> financial
> > > > > data,
> > > > > > and there lots of text stuff like assets names, fin. instruments,
> > > > > companies
> > > > > > etc.
> > > > > > In order to operate with this quickly and reliably, users used to
> > > work
> > > > > with
> > > > > > text search, type-ahead completions, suggestions.
> > > > > >
> > > > > > For this purposes we are indexing particular string data in
> separate
> > > > > caches.
> > > > > >
> > > > > > Sorting capabilities and response size limitations are very
> important
> > > > > > there. As our API have to provide most relevant information in
> view
> > > of
> > > > > > limited size.
> > > > > >
> > > > > > Now let me comment some Ignite/Lucene perspective.
> > > > > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs
> > > *already
> > > > > > sorted by *score *(relevance). So most relevant documents are on
> the
> > > > top.
> > > > > > And currently distributed queries responses from different nodes
> are
> > > > > merged
> > > > > > into final query cursor queue in arbitrary way.
> > > > > > So in fact we already have the score order ruined here. Also
> Ignite
> > > &g

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-03 Thread Yuriy Shuliga
Denis,

Thank you for your attention to this.
as for now, the https://issues.apache.org/jira/browse/IGNITE-12189 ticket
is still pending review.
Do we have a chance to move it forward somehow?

BR,
Yuriy Shuliha

пн, 30 вер. 2019 о 23:35 Denis Magda  пише:

> Yuriy,
>
> I've seen you opening a pull-request with the first changes:
> https://issues.apache.org/jira/browse/IGNITE-12189
>
> Alex Scherbakov and Ivan are you the right guys to do the review?
>
> -
> Denis
>
>
> On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван  wrote:
>
> > Yuriy,
> >
> > Thank you for providing details! Quite interesting.
> >
> > Yes, we already have support of distributed limit and merging sorted
> > subresults for SQL queries. E.g. ReduceIndexSorted and
> > MergeStreamIterator are used for merging sorted streams.
> >
> > Could you please also clarify about score/relevance? Is it provided by
> > Lucene engine for each query result? I am thinking how to do sorted
> > merge properly in this case.
> >
> > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> > >
> > > Ivan,
> > >
> > > Thank you for interesting question!
> > >
> > > Text searches (or full text searches) are mostly human-oriented. And
> the
> > > point of user's interest is topmost part of response.
> > > Then user can read it, evaluate and use the given records for further
> > > purposes.
> > >
> > > Particularly in our case, we use Ignite for operations with financial
> > data,
> > > and there lots of text stuff like assets names, fin. instruments,
> > companies
> > > etc.
> > > In order to operate with this quickly and reliably, users used to work
> > with
> > > text search, type-ahead completions, suggestions.
> > >
> > > For this purposes we are indexing particular string data in separate
> > caches.
> > >
> > > Sorting capabilities and response size limitations are very important
> > > there. As our API have to provide most relevant information in view of
> > > limited size.
> > >
> > > Now let me comment some Ignite/Lucene perspective.
> > > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> > > sorted by *score *(relevance). So most relevant documents are on the
> top.
> > > And currently distributed queries responses from different nodes are
> > merged
> > > into final query cursor queue in arbitrary way.
> > > So in fact we already have the score order ruined here. Also Ignite
> > > requests all possible documents from Lucene that is redundant and not
> > good
> > > for performance.
> > >
> > > I'm implementing *limit* parameter to be part of *TextQuery *and have
> to
> > > notice that we still have to add sorting for text queries processing in
> > > order to have applicable results.
> > >
> > > *Limit* parameter itself should improve the part of issues from above,
> > but
> > > definitely, sorting by document score at least  should be implemented
> > along
> > > with limit.
> > >
> > > This is a pretty short commentary if you still have any questions,
> please
> > > ask, do not hesitate)
> > >
> > > BR,
> > > Yuriy Shuliha
> > >
> > > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> > >
> > > > Yuriy,
> > > >
> > > > Greatly appreciate your interest.
> > > >
> > > > Could you please elaborate a little bit about sorting? What tasks
> does
> > > > it help to solve and how? It would be great to provide an example.
> > > >
> > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > > alexey.scherbak...@gmail.com>:
> > > > >
> > > > > Denis,
> > > > >
> > > > > I like the idea of throwing an exception for enabled text queries
> on
> > > > > persistent caches.
> > > > >
> > > > > Also I'm fine with proposed limit for unsorted searches.
> > > > >
> > > > > Yury, please proceed with ticket creation.
> > > > >
> > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I see nothing wrong with Yury's proposal in regards full-text
> > search
> > > > API
> > > > > > evolution as long as Yury is ready to push it forward.
>

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-10-03 Thread Yuriy Shuliga
Ivan,

Regarding you question about Lucene search response.
  *IndexSearcher.search()* always returns result  sorted  at least by *score
*(*relevance*) or by defined *Sort *which includes ordering fields and
rules.
This means than even for now *GridLunceneIndex* result will be incorrect in
case of distributed queries as they are merged in arbitrary way.
Under the hood *ScoreDoc* object is used to fetch desired document/record
and this class contains *docId* and *score*. So small wrapper with
Comparable interface may solve merging of ordered results.

BR,
Yuriy Shuliha


пт, 27 вер. 2019 о 18:48 Павлухин Иван  пише:

> Yuriy,
>
> Thank you for providing details! Quite interesting.
>
> Yes, we already have support of distributed limit and merging sorted
> subresults for SQL queries. E.g. ReduceIndexSorted and
> MergeStreamIterator are used for merging sorted streams.
>
> Could you please also clarify about score/relevance? Is it provided by
> Lucene engine for each query result? I am thinking how to do sorted
> merge properly in this case.
>
> ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
> >
> > Ivan,
> >
> > Thank you for interesting question!
> >
> > Text searches (or full text searches) are mostly human-oriented. And the
> > point of user's interest is topmost part of response.
> > Then user can read it, evaluate and use the given records for further
> > purposes.
> >
> > Particularly in our case, we use Ignite for operations with financial
> data,
> > and there lots of text stuff like assets names, fin. instruments,
> companies
> > etc.
> > In order to operate with this quickly and reliably, users used to work
> with
> > text search, type-ahead completions, suggestions.
> >
> > For this purposes we are indexing particular string data in separate
> caches.
> >
> > Sorting capabilities and response size limitations are very important
> > there. As our API have to provide most relevant information in view of
> > limited size.
> >
> > Now let me comment some Ignite/Lucene perspective.
> > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> > sorted by *score *(relevance). So most relevant documents are on the top.
> > And currently distributed queries responses from different nodes are
> merged
> > into final query cursor queue in arbitrary way.
> > So in fact we already have the score order ruined here. Also Ignite
> > requests all possible documents from Lucene that is redundant and not
> good
> > for performance.
> >
> > I'm implementing *limit* parameter to be part of *TextQuery *and have to
> > notice that we still have to add sorting for text queries processing in
> > order to have applicable results.
> >
> > *Limit* parameter itself should improve the part of issues from above,
> but
> > definitely, sorting by document score at least  should be implemented
> along
> > with limit.
> >
> > This is a pretty short commentary if you still have any questions, please
> > ask, do not hesitate)
> >
> > BR,
> > Yuriy Shuliha
> >
> > чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
> >
> > > Yuriy,
> > >
> > > Greatly appreciate your interest.
> > >
> > > Could you please elaborate a little bit about sorting? What tasks does
> > > it help to solve and how? It would be great to provide an example.
> > >
> > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com>:
> > > >
> > > > Denis,
> > > >
> > > > I like the idea of throwing an exception for enabled text queries on
> > > > persistent caches.
> > > >
> > > > Also I'm fine with proposed limit for unsorted searches.
> > > >
> > > > Yury, please proceed with ticket creation.
> > > >
> > > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > > >
> > > > > Igniters,
> > > > >
> > > > > I see nothing wrong with Yury's proposal in regards full-text
> search
> > > API
> > > > > evolution as long as Yury is ready to push it forward.
> > > > >
> > > > > As for the in-memory mode only, it makes total sense for in-memory
> data
> > > > > grid deployments when Ignite caches data of an underlying DB like
> > > Postgres.
> > > > > As part of the changes, I would simply throw an exception (by
> default)
> > > if
> > > > > the one attempts to use text indices with the native persistence
> >

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-25 Thread Yuriy Shuliga
Ivan,

Thank you for interesting question!

Text searches (or full text searches) are mostly human-oriented. And the
point of user's interest is topmost part of response.
Then user can read it, evaluate and use the given records for further
purposes.

Particularly in our case, we use Ignite for operations with financial data,
and there lots of text stuff like assets names, fin. instruments, companies
etc.
In order to operate with this quickly and reliably, users used to work with
text search, type-ahead completions, suggestions.

For this purposes we are indexing particular string data in separate caches.

Sorting capabilities and response size limitations are very important
there. As our API have to provide most relevant information in view of
limited size.

Now let me comment some Ignite/Lucene perspective.
Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
sorted by *score *(relevance). So most relevant documents are on the top.
And currently distributed queries responses from different nodes are merged
into final query cursor queue in arbitrary way.
So in fact we already have the score order ruined here. Also Ignite
requests all possible documents from Lucene that is redundant and not good
for performance.

I'm implementing *limit* parameter to be part of *TextQuery *and have to
notice that we still have to add sorting for text queries processing in
order to have applicable results.

*Limit* parameter itself should improve the part of issues from above, but
definitely, sorting by document score at least  should be implemented along
with limit.

This is a pretty short commentary if you still have any questions, please
ask, do not hesitate)

BR,
Yuriy Shuliha

чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:

> Yuriy,
>
> Greatly appreciate your interest.
>
> Could you please elaborate a little bit about sorting? What tasks does
> it help to solve and how? It would be great to provide an example.
>
> ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>:
> >
> > Denis,
> >
> > I like the idea of throwing an exception for enabled text queries on
> > persistent caches.
> >
> > Also I'm fine with proposed limit for unsorted searches.
> >
> > Yury, please proceed with ticket creation.
> >
> > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> >
> > > Igniters,
> > >
> > > I see nothing wrong with Yury's proposal in regards full-text search
> API
> > > evolution as long as Yury is ready to push it forward.
> > >
> > > As for the in-memory mode only, it makes total sense for in-memory data
> > > grid deployments when Ignite caches data of an underlying DB like
> Postgres.
> > > As part of the changes, I would simply throw an exception (by default)
> if
> > > the one attempts to use text indices with the native persistence
> enabled.
> > > If the person is ready to live with that limitation that an explicit
> > > configuration change is needed to come around the exception.
> > >
> > > Thoughts?
> > >
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga 
> wrote:
> > >
> > > > Hello to all again,
> > > >
> > > > Thank you for important comments and notes given below!
> > > >
> > > > Let me answer and continue the discussion.
> > > >
> > > > (I) Overall needs in Lucene indexing
> > > >
> > > > Alexei has referenced to
> > > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > > absence of index persistence was declared as an obstacle to further
> > > > development.
> > > >
> > > > a) This ticket is already closed as not valid.b) There are definite
> needs
> > > > (and in our project as well) in just in-memory indexing of selected
> data.
> > > > We intend to use search capabilities for fetching limited amount of
> > > records
> > > > that should be used in type-ahead search / suggestions.
> > > > Not all of the data will be indexed and the are no need in Lucene
> index
> > > to
> > > > be persistence. Hope this is a wide pattern of text-search usage.
> > > >
> > > > (II) Necessary fixes in current implementation.
> > > >
> > > > a) Implementation of correct *limit *(*offset* seems to be not
> required
> > > in
> > > > text-search tasks for now)
> > > > I have investigated the data flow for distributed text queries. it
> was
> > > > simple test prefix query, like 'name'*='

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-17 Thread Yuriy Shuliga
 reordeing.
> > > Basically, merge phase receive results from data nodes asynchronously
> and
> > > messages from different nodes can't be ordered.
> > >
> > > 2.
> > > a. "tokenize" param name (for @QueryTextFiled) looks more verbose,
> isn't
> > > it.
> > > b,c. What about distributed query? How partial results from nodes will
> be
> > > merged?
> > >  Does Lucene allows to configure comparator for data sorting?
> > > What comparator Ignite should choose to sort result on merge phase?
> > >
> > > 3. For now Lucene engine is not configurable at all. E.g. it is
> > impossible
> > > to configure Tokenizer.
> > > I'd think about possible ways to configure engine at first and only
> then
> > go
> > > further to discuss\implement complex features,
> > > that may depends on engine config.
> > >
> > >
> > >
> > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy Shuliga 
> wrote:
> > >
> > > > Dear community,
> > > >
> > > > By starting this chain I'd like to open discussion that would come to
> > > > contribution results in subj. area.
> > > >
> > > > Ignite has indexing capabilities, backed up by different mechanisms,
> > > > including Lucene.
> > > >
> > > > Currently, Lucene 7.5.0 is used (past year release).
> > > > This is a wide spread and mature technology that covers text search
> > area
> > > > and beyond (e.g. spacial data indexing).
> > > >
> > > > My goal is to *expose more Lucene functionality to Ignite indexing
> and
> > > > query mechanisms for text data*.
> > > >
> > > > It's quite simple request at current stage. It is coming from our
> > > project's
> > > > needs, but i believe, will be useful for a lot more people.
> > > > Let's walk through and vote or discuss about Jira tickets for them.
> > > >
> > > > 1.[trivial] Use  dataQuery.getPageSize()  to limit search response
> > items
> > > > inside GridLuceneIndex.query(). Currently it is calling
> > > > IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all
> > > scored
> > > > matches will me returned, what we do not need in most cases.
> > > >
> > > > 2.[simple] Add sorting.  Then more capable search call can be
> > > > executed: *IndexSearcher.search(query, count,
> > > > sort) *
> > > > Implementation steps:
> > > > a) Introduce boolean *sortField* parameter in *@QueryTextFiled *
> > > > annotation. If
> > > > *true *the filed will be indexed but not tokenized. Number types are
> > > > preferred here.
> > > > b) Add *sort* collection to *TextQuery* constructor. It should define
> > > > desired sort fields used for querying.
> > > > c) Implement Lucene sort usage in GridLuceneIndex.query().
> > > >
> > > > 3.[moderate] Build complex queries with *TextQuery*, including
> > > > terms/queries boosting.
> > > > *This section for voting only, as requires more detailed work. Should
> > be
> > > > extended if community is interested in it.*
> > > >
> > > > Looking forward to your comments!
> > > >
> > > > BR,
> > > > Yuriy Shuliha
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-08-29 Thread Yuriy Shuliga
Dear community,

By starting this chain I'd like to open discussion that would come to
contribution results in subj. area.

Ignite has indexing capabilities, backed up by different mechanisms,
including Lucene.

Currently, Lucene 7.5.0 is used (past year release).
This is a wide spread and mature technology that covers text search area
and beyond (e.g. spacial data indexing).

My goal is to *expose more Lucene functionality to Ignite indexing and
query mechanisms for text data*.

It's quite simple request at current stage. It is coming from our project's
needs, but i believe, will be useful for a lot more people.
Let's walk through and vote or discuss about Jira tickets for them.

1.[trivial] Use  dataQuery.getPageSize()  to limit search response items
inside GridLuceneIndex.query(). Currently it is calling
IndexSearcher.search(query, *Integer.MAX_VALUE*) - so basically all scored
matches will me returned, what we do not need in most cases.

2.[simple] Add sorting.  Then more capable search call can be
executed: *IndexSearcher.search(query, count,
sort) *
Implementation steps:
a) Introduce boolean *sortField* parameter in *@QueryTextFiled * annotation. If
*true *the filed will be indexed but not tokenized. Number types are
preferred here.
b) Add *sort* collection to *TextQuery* constructor. It should define
desired sort fields used for querying.
c) Implement Lucene sort usage in GridLuceneIndex.query().

3.[moderate] Build complex queries with *TextQuery*, including
terms/queries boosting.
*This section for voting only, as requires more detailed work. Should be
extended if community is interested in it.*

Looking forward to your comments!

BR,
Yuriy Shuliha


Fwd: Hello

2019-08-28 Thread Yuriy Shuliga
UPD:
Jira ID:  Yuriy_Shuliha
To whom it may concern, please add me as contributor

Yuriy

-- Forwarded message -
Від: Yuriy Shuliga 
Date: ср, 28 серп. 2019 о 16:03
Subject: Hello
To: 


Dear Ignite Team!

My name is Yuriy Shuliha,
My current work is dedicated to Search services for various businesses.
Now we are working with Ignite as main computation facility, and are
interested in development of its TextQuery capabilities backed up by
Lucene inside.

My contribution goal is to extend GridLuceneIndex functionality by :
1)  Ability to set limits to returned TopDocs.
2) Adding Sort fields to index/query (potentially via new annotation)
3) Adding Boost to certain indexed fields (by @QueryTextFiled annotation
extending)

Looking forward to productive  collaboration!

Yuriy Shluiha


Hello

2019-08-28 Thread Yuriy Shuliga
Dear Ignite Team!

My name is Yuriy Shuliha,
My current work is dedicated to Search services for various businesses.
Now we are working with Ignite as main computation facility, and are
interested in development of its TextQuery capabilities backed up by
Lucene inside.

My contribution goal is to extend GridLuceneIndex functionality by :
1)  Ability to set limits to returned TopDocs.
2) Adding Sort fields to index/query (potentially via new annotation)
3) Adding Boost to certain indexed fields (by @QueryTextFiled annotation
extending)

Looking forward to productive  collaboration!

Yuriy Shluiha