[MTCGA]: new failures in builds [6109456] needs to be handled

2021-08-02 Thread dpavlov . tasks
Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 *Test with high flaky rate in master 
FailureProcessorThreadDumpThrottlingTest.testThrottlingPerFailureType 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=4454424024505616112=%3Cdefault%3E=testDetails
 No changes in the build

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 05:59:25 03-08-2021 


Re: Text Queries Support

2021-08-02 Thread Atri Sharma
Hi Ivan,

Would you like to propose an alternative to Lucene?

Atri

On Mon, 2 Aug 2021, 13:48 Ivan Pavlukhin,  wrote:

> Folks,
>
> Sorry if read the thread not thoroughly enough, but do we consider
> Lucene as obviously right choice? In my understanding Ignite history
> has shown clearly that "fastest feature implementation" is not usually
> the best. And one example of this are text queries. Are not we trying
> to do a same mistake again? FTS is a huge feature, I do not believe
> there is an easy win for it.
>
> 2021-07-27 19:18 GMT+03:00, Atri Sharma :
> > Andrey,
> >
> >> Per-partition Lucene index looks simple to implement, but it may require
> >> per-partition SQL to make full-text search expressions work correctly
> >> within the SQL quiery.
> > I think that as long as we follow the map - reduce process that we
> > already do for other queries, we should be fine.
> >
> >> Per-partition SQL index may kill the performance. We already tried to do
> >> that in Ignite 2. However, QueryParallelism feature helps to speed up
> >> some
> >> data-intensive queries,
> >> but hits the performance in simple cases, and at some point (e.g.
> >> segments
> >> > number of CPU) the performance rapidly degrades with the increasing
> >> number of segments.
> >
> > Yeah, that is always the case, but a global index will be a nightmare
> > in terms of concurrency and pessimistic concurrency control will
> > anyways kill the benefits, coupled with the metadata requirements.
> > What were the specific issues with per partition index?
> >>
> >> AFAIK, Lucene widely used bitmap indices that are easy to merge.
> >> Maybe, the map-reduce technique underneath FTS expressions and some
> hacks
> >> will add a minimal overhead.
> >
> > Lucene uses many types of indices but the aspect here is that per
> > partition Lucene indices can return docIDs and we can merge them in
> > reduce phase. So we are abstracted out from specifics of the internal
> > index being used to serve the query.
> >
> >>
> >> > As illustrated by Ilya, we can use Ignite's WAL records to rebuild
> >> > Lucene indices. The important thing here is to not treat Lucene
> >> > indices as source of truth.
> >> To use WAL we either should relay Lucene files to our Page memory or be
> >> aware of Lucene files structure.
> >> The first looks tricky, as we should guarantee a contiguous address
> space
> >> in Page memory for reflecting Lucene file. Maybe separate managed memory
> >> segment with its own rules?
> >
> > Why not use Lucene's MMappedDirectory and map it to our storage classes?
> >
> >>
> >> >> Transactions.
> >> >> * Will we support transactions?
> >> > Lucene has no concept of transactions.
> >> Yes, but we have.
> >> Lucene index may be non-transactional, but users never expect to see
> >> uncommited data.
> >> How does this connect with transactional SQL?
> > We could have the Lucene writes done as a part of transactions and ack
> > back only when it succeeds/fails. WDYT?
> >>
> >> On Tue, Jul 27, 2021 at 1:36 PM Atri Sharma  wrote:
> >>
> >> > Sorry, I planned on creating a Wiki page for this, but it makes more
> >> > sense to be replying here.
> >> >
> >> > > * How Lucene index can be split among the nodes?
> >> >
> >> > We can have partition level indices on each node.
> >> >
> >> > > * If we'll have a single index for all partitions on the particular
> >> > > node,
> >> > > then how index records will be aware of partitioning?
> >> >
> >> > Index records dont need to be aware of partitioning -- each Lucene
> >> > index is independent.
> >> >
> >> > > This is important to filter out backup records from the results to
> >> > > avoid
> >> > > duplicates.
> >> >
> >> > We can merge documents from different nodes and remove duplicates as
> >> > long as docIDs are globally unique.
> >> >
> >> > > * How results from several nodes can be merged on the Reduce stage?
> >> >
> >> > As long as documents have a globally unique docID, Lucene has merge
> >> > functions that can merge results from multiple partial results.
> >> >
> >> > > * Does Lucene supports smth like JOIN operation or others that may
> >> > require
> >> > > data from another partition or index?
> >> >
> >> > As illustrated by Ilya, Block-Join works for us.
> >> >
> >> > > If so, then it likes to multistep query with merging results on
> >> > > intermediate stages and requires detailed investigation and design.
> >> > > It is ok if Ignite will have some limitations here, but we would
> like
> >> > > to
> >> > > know about them at the early stage.
> >> >
> >> > > * How effectively map Lucene files to the page memory? Is it even
> >> > possible?
> >> >
> >> > Lucene has PageDirectory implementations which allow storing Lucene
> >> > indices on different kind of file structures. It has a
> >> > MMappedFileDirectory that we could use?
> >> >
> >> > > Otherwise, how to deal with potential OOM on large queries and
> memory
> >> > > capacity planning?
> >> >
> >> > We can use Lucene's MMapped directory.

Re: Text Queries Support

2021-08-02 Thread Ivan Pavlukhin
Folks,

Sorry if read the thread not thoroughly enough, but do we consider
Lucene as obviously right choice? In my understanding Ignite history
has shown clearly that "fastest feature implementation" is not usually
the best. And one example of this are text queries. Are not we trying
to do a same mistake again? FTS is a huge feature, I do not believe
there is an easy win for it.

2021-07-27 19:18 GMT+03:00, Atri Sharma :
> Andrey,
>
>> Per-partition Lucene index looks simple to implement, but it may require
>> per-partition SQL to make full-text search expressions work correctly
>> within the SQL quiery.
> I think that as long as we follow the map - reduce process that we
> already do for other queries, we should be fine.
>
>> Per-partition SQL index may kill the performance. We already tried to do
>> that in Ignite 2. However, QueryParallelism feature helps to speed up
>> some
>> data-intensive queries,
>> but hits the performance in simple cases, and at some point (e.g.
>> segments
>> > number of CPU) the performance rapidly degrades with the increasing
>> number of segments.
>
> Yeah, that is always the case, but a global index will be a nightmare
> in terms of concurrency and pessimistic concurrency control will
> anyways kill the benefits, coupled with the metadata requirements.
> What were the specific issues with per partition index?
>>
>> AFAIK, Lucene widely used bitmap indices that are easy to merge.
>> Maybe, the map-reduce technique underneath FTS expressions and some hacks
>> will add a minimal overhead.
>
> Lucene uses many types of indices but the aspect here is that per
> partition Lucene indices can return docIDs and we can merge them in
> reduce phase. So we are abstracted out from specifics of the internal
> index being used to serve the query.
>
>>
>> > As illustrated by Ilya, we can use Ignite's WAL records to rebuild
>> > Lucene indices. The important thing here is to not treat Lucene
>> > indices as source of truth.
>> To use WAL we either should relay Lucene files to our Page memory or be
>> aware of Lucene files structure.
>> The first looks tricky, as we should guarantee a contiguous address space
>> in Page memory for reflecting Lucene file. Maybe separate managed memory
>> segment with its own rules?
>
> Why not use Lucene's MMappedDirectory and map it to our storage classes?
>
>>
>> >> Transactions.
>> >> * Will we support transactions?
>> > Lucene has no concept of transactions.
>> Yes, but we have.
>> Lucene index may be non-transactional, but users never expect to see
>> uncommited data.
>> How does this connect with transactional SQL?
> We could have the Lucene writes done as a part of transactions and ack
> back only when it succeeds/fails. WDYT?
>>
>> On Tue, Jul 27, 2021 at 1:36 PM Atri Sharma  wrote:
>>
>> > Sorry, I planned on creating a Wiki page for this, but it makes more
>> > sense to be replying here.
>> >
>> > > * How Lucene index can be split among the nodes?
>> >
>> > We can have partition level indices on each node.
>> >
>> > > * If we'll have a single index for all partitions on the particular
>> > > node,
>> > > then how index records will be aware of partitioning?
>> >
>> > Index records dont need to be aware of partitioning -- each Lucene
>> > index is independent.
>> >
>> > > This is important to filter out backup records from the results to
>> > > avoid
>> > > duplicates.
>> >
>> > We can merge documents from different nodes and remove duplicates as
>> > long as docIDs are globally unique.
>> >
>> > > * How results from several nodes can be merged on the Reduce stage?
>> >
>> > As long as documents have a globally unique docID, Lucene has merge
>> > functions that can merge results from multiple partial results.
>> >
>> > > * Does Lucene supports smth like JOIN operation or others that may
>> > require
>> > > data from another partition or index?
>> >
>> > As illustrated by Ilya, Block-Join works for us.
>> >
>> > > If so, then it likes to multistep query with merging results on
>> > > intermediate stages and requires detailed investigation and design.
>> > > It is ok if Ignite will have some limitations here, but we would like
>> > > to
>> > > know about them at the early stage.
>> >
>> > > * How effectively map Lucene files to the page memory? Is it even
>> > possible?
>> >
>> > Lucene has PageDirectory implementations which allow storing Lucene
>> > indices on different kind of file structures. It has a
>> > MMappedFileDirectory that we could use?
>> >
>> > > Otherwise, how to deal with potential OOM on large queries and memory
>> > > capacity planning?
>> >
>> > We can use Lucene's MMapped directory.
>> >
>> > >
>> > > Persistence.
>> > > * How and what consistency guarantees could we have/expect?
>> >
>> > Lucene does not have WAL logs but is append only
>> >
>> > > Seems, we may not be able to write physical records for Lucene index
>> > > to
>> > our
>> > > WAL. What can we do with this?
>> >
>> > As illustrated by Ilya, we can use Ignite's WAL 

Apache Ignite Developers - Legacy Mail Archive has been deleted

2021-08-02 Thread Nabble
Dear user,

Your Nabble site "Apache Ignite Developers - Legacy Mail Archive" has been 
deleted.

You can download a backup of this site from the link below.
Nabble will try to keep this backup available for a few months, but this is not 
guaranteed.
If this content is important to you, save this copy as soon as possible.

http://s1.nabble.com/backups/site_152_1065951954.zip

Sincerely,
The Nabble team

Free Embeddable Forum powered by Nabble
http://www.nabble.com/