Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-13 Thread Ivan Pavlukhin
Thanks for details!

2021-12-13 11:14 GMT+03:00, Maksim Timonin :
> Hi, Ivan!
>
>> Does IndexQuery has separate codebase? Does it share some code with
> ScanQuery
>
> Yes, IndexQuery mostly shares processing with ScanQuery:
> requests/responses, a remote filter. Reducer (merge of query results) on an
> initiator node has its own implementation for IndexQuery (MergeSort), but
> it's built into existing ScanQuery processing.
>
> On Mon, Dec 13, 2021 at 11:00 AM Ivan Pavlukhin 
> wrote:
>
>> > Then, if there are no objections, the short-term plan is:
>>
>> Sounds ok for me.
>>
>> > AFAIR Scan and SQL queries implementations are totally different. Could
>> you tell me how Index queries fit there?
>>
>> I suppose my questions was misleading. Actually I would like to know
>> how code is organized today. AFAIR SQL and scan queries has their own
>> codebases (messages, merging results from different nodes and etc).
>> Does IndexQuery has separate codebase? Does it share some code with
>> ScanQuery on a "query" layer (higher than BTree layer)?
>>
>> 2021-12-10 13:25 GMT+03:00, Maksim Timonin :
>> > Hi!
>> >
>> >> Existing users may already have some logic in place to handle
>> > inconsistencies
>> >
>> > Pavel, I'm not aware of such users but your comment makes sense. So, I'
>> OK
>> > with adding an option for ScanQuery. Naming of the option is debatable.
>> The
>> > name "reservePartitions" looks good, but we actually already reserve a
>> > partition when it is explicitly specified in ScanQuery. Then, it can be
>> > a
>> > bit misleading in the case of explicitly setting this param to `false`
>> with
>> > a specified partition. But we can just mention it in javadocs, that the
>> > setting affects only full scan.
>> >
>> >> AFAIR Scan and SQL queries implementations are totally different.
>> >> Could
>> > you tell me how Index queries fit there?
>> >
>> > IndexQuery scans indexes like SQL does with few optimizations of
>> traversing
>> > BPlusTree, also there are some differences in query processing, also
>> > IndexQuery provides sorted results by default. But I don't expect
>> > inconsistency in results for SQL / IndexQuery for similar queries.
>> > Actually, I should add tests for that and fix failures if any.
>> >
>> >> In an ideal world it would be great to have only one API
>> >
>> > I think it's possible to teach SQL to switch to cache queries for some
>> > cases, or provide such an opportunity with hints or explicit functions.
>> And
>> > these queries might be parts of an SQL query plan. Or we can go even
>> > deeper, maybe it could be like Apache Spark stages, when we can build
>> > our
>> > plan with different types of queries, and the same type provides to
>> > users the opportunity to run only specific stages: DataFrame.sql() or
>> > DataFrame.filter(inline=true/false).
>> >
>> >> Perhaps IndexQuery should also cover regular entry iteration cases
>> >
>> > It's possible too. IndexQuery provides an opportunity to scan the PK
>> index,
>> > then it can start ScanQuery under the hood for some cases.
>> >
>> > But anyway, to make them run through the single API, we should provide
>> the
>> > same guarantees.
>> >
>> > Then, if there are no objections, the short-term plan is:
>> > 1. Implement partition reservation for IndexQuery.
>> > 2. Add the option for ScanQuery.
>> > 3. Make tests that show that IndexQuery / SQL / ScanQuery is
>> > replaceable
>> > for some types of queries in terms of result consistency.
>> > 4. Then discuss again, how we can integrate them together.
>> >
>> >
>> > On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin 
>> > wrote:
>> >
>> >> Actually I brought a point about using SQL queries instead of scan
>> >> queries because I worry about inconsistency between different
>> >> implementations. AFAIR Scan and SQL queries implementations are
>> >> totally different. Could you tell me how Index queries fit there?
>> >>
>> >> My general ideas are as follows:
>> >> 1. In an ideal world it would be great to have only one API (to rule
>> >> them all) and implementation to cover all cases (i.e. scan, index,
>> >> SQL). Also, I wonder how other vendors tackle this?
>> >> 2. New IndexQuery might not mimic ScanQuery behavior but instead have
>> >> a correct/expected one (including partition reservation). Perhaps
>> >> IndexQuery should also cover regular entry iteration (scan) cases and
>> >> become a new ground for generalized scan/index queries.
>> >>
>> >> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn :
>> >> > IndexQuery is experimental, so we can indeed make it reserve
>> partitions
>> >> by
>> >> > default. No objections here.
>> >> >
>> >> > But with ScanQuery, I think it's better to add a "reservePartitions"
>> >> > property so that default behavior is not affected.
>> >> > Existing users may already have some logic in place to handle
>> >> > inconsistencies. So if we enable reservation by default,
>> >> > they'll get an unnecessary performance degradation.
>> >> >
>> >> > On Thu, Dec 9, 

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-13 Thread Maksim Timonin
Hi, Ivan!

> Does IndexQuery has separate codebase? Does it share some code with
ScanQuery

Yes, IndexQuery mostly shares processing with ScanQuery:
requests/responses, a remote filter. Reducer (merge of query results) on an
initiator node has its own implementation for IndexQuery (MergeSort), but
it's built into existing ScanQuery processing.

On Mon, Dec 13, 2021 at 11:00 AM Ivan Pavlukhin  wrote:

> > Then, if there are no objections, the short-term plan is:
>
> Sounds ok for me.
>
> > AFAIR Scan and SQL queries implementations are totally different. Could
> you tell me how Index queries fit there?
>
> I suppose my questions was misleading. Actually I would like to know
> how code is organized today. AFAIR SQL and scan queries has their own
> codebases (messages, merging results from different nodes and etc).
> Does IndexQuery has separate codebase? Does it share some code with
> ScanQuery on a "query" layer (higher than BTree layer)?
>
> 2021-12-10 13:25 GMT+03:00, Maksim Timonin :
> > Hi!
> >
> >> Existing users may already have some logic in place to handle
> > inconsistencies
> >
> > Pavel, I'm not aware of such users but your comment makes sense. So, I'
> OK
> > with adding an option for ScanQuery. Naming of the option is debatable.
> The
> > name "reservePartitions" looks good, but we actually already reserve a
> > partition when it is explicitly specified in ScanQuery. Then, it can be a
> > bit misleading in the case of explicitly setting this param to `false`
> with
> > a specified partition. But we can just mention it in javadocs, that the
> > setting affects only full scan.
> >
> >> AFAIR Scan and SQL queries implementations are totally different. Could
> > you tell me how Index queries fit there?
> >
> > IndexQuery scans indexes like SQL does with few optimizations of
> traversing
> > BPlusTree, also there are some differences in query processing, also
> > IndexQuery provides sorted results by default. But I don't expect
> > inconsistency in results for SQL / IndexQuery for similar queries.
> > Actually, I should add tests for that and fix failures if any.
> >
> >> In an ideal world it would be great to have only one API
> >
> > I think it's possible to teach SQL to switch to cache queries for some
> > cases, or provide such an opportunity with hints or explicit functions.
> And
> > these queries might be parts of an SQL query plan. Or we can go even
> > deeper, maybe it could be like Apache Spark stages, when we can build our
> > plan with different types of queries, and the same type provides to
> > users the opportunity to run only specific stages: DataFrame.sql() or
> > DataFrame.filter(inline=true/false).
> >
> >> Perhaps IndexQuery should also cover regular entry iteration cases
> >
> > It's possible too. IndexQuery provides an opportunity to scan the PK
> index,
> > then it can start ScanQuery under the hood for some cases.
> >
> > But anyway, to make them run through the single API, we should provide
> the
> > same guarantees.
> >
> > Then, if there are no objections, the short-term plan is:
> > 1. Implement partition reservation for IndexQuery.
> > 2. Add the option for ScanQuery.
> > 3. Make tests that show that IndexQuery / SQL / ScanQuery is replaceable
> > for some types of queries in terms of result consistency.
> > 4. Then discuss again, how we can integrate them together.
> >
> >
> > On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin 
> > wrote:
> >
> >> Actually I brought a point about using SQL queries instead of scan
> >> queries because I worry about inconsistency between different
> >> implementations. AFAIR Scan and SQL queries implementations are
> >> totally different. Could you tell me how Index queries fit there?
> >>
> >> My general ideas are as follows:
> >> 1. In an ideal world it would be great to have only one API (to rule
> >> them all) and implementation to cover all cases (i.e. scan, index,
> >> SQL). Also, I wonder how other vendors tackle this?
> >> 2. New IndexQuery might not mimic ScanQuery behavior but instead have
> >> a correct/expected one (including partition reservation). Perhaps
> >> IndexQuery should also cover regular entry iteration (scan) cases and
> >> become a new ground for generalized scan/index queries.
> >>
> >> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn :
> >> > IndexQuery is experimental, so we can indeed make it reserve
> partitions
> >> by
> >> > default. No objections here.
> >> >
> >> > But with ScanQuery, I think it's better to add a "reservePartitions"
> >> > property so that default behavior is not affected.
> >> > Existing users may already have some logic in place to handle
> >> > inconsistencies. So if we enable reservation by default,
> >> > they'll get an unnecessary performance degradation.
> >> >
> >> > On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin <
> timoninma...@apache.org>
> >> > wrote:
> >> >
> >> >> Hi, Ivan, Pavel! Thanks for your responses.
> >> >>
> >> >> > But is there a real need/benefit from using scan queries 

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-13 Thread Ivan Pavlukhin
> Then, if there are no objections, the short-term plan is:

Sounds ok for me.

> AFAIR Scan and SQL queries implementations are totally different. Could you 
> tell me how Index queries fit there?

I suppose my questions was misleading. Actually I would like to know
how code is organized today. AFAIR SQL and scan queries has their own
codebases (messages, merging results from different nodes and etc).
Does IndexQuery has separate codebase? Does it share some code with
ScanQuery on a "query" layer (higher than BTree layer)?

2021-12-10 13:25 GMT+03:00, Maksim Timonin :
> Hi!
>
>> Existing users may already have some logic in place to handle
> inconsistencies
>
> Pavel, I'm not aware of such users but your comment makes sense. So, I' OK
> with adding an option for ScanQuery. Naming of the option is debatable. The
> name "reservePartitions" looks good, but we actually already reserve a
> partition when it is explicitly specified in ScanQuery. Then, it can be a
> bit misleading in the case of explicitly setting this param to `false` with
> a specified partition. But we can just mention it in javadocs, that the
> setting affects only full scan.
>
>> AFAIR Scan and SQL queries implementations are totally different. Could
> you tell me how Index queries fit there?
>
> IndexQuery scans indexes like SQL does with few optimizations of traversing
> BPlusTree, also there are some differences in query processing, also
> IndexQuery provides sorted results by default. But I don't expect
> inconsistency in results for SQL / IndexQuery for similar queries.
> Actually, I should add tests for that and fix failures if any.
>
>> In an ideal world it would be great to have only one API
>
> I think it's possible to teach SQL to switch to cache queries for some
> cases, or provide such an opportunity with hints or explicit functions. And
> these queries might be parts of an SQL query plan. Or we can go even
> deeper, maybe it could be like Apache Spark stages, when we can build our
> plan with different types of queries, and the same type provides to
> users the opportunity to run only specific stages: DataFrame.sql() or
> DataFrame.filter(inline=true/false).
>
>> Perhaps IndexQuery should also cover regular entry iteration cases
>
> It's possible too. IndexQuery provides an opportunity to scan the PK index,
> then it can start ScanQuery under the hood for some cases.
>
> But anyway, to make them run through the single API, we should provide the
> same guarantees.
>
> Then, if there are no objections, the short-term plan is:
> 1. Implement partition reservation for IndexQuery.
> 2. Add the option for ScanQuery.
> 3. Make tests that show that IndexQuery / SQL / ScanQuery is replaceable
> for some types of queries in terms of result consistency.
> 4. Then discuss again, how we can integrate them together.
>
>
> On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin 
> wrote:
>
>> Actually I brought a point about using SQL queries instead of scan
>> queries because I worry about inconsistency between different
>> implementations. AFAIR Scan and SQL queries implementations are
>> totally different. Could you tell me how Index queries fit there?
>>
>> My general ideas are as follows:
>> 1. In an ideal world it would be great to have only one API (to rule
>> them all) and implementation to cover all cases (i.e. scan, index,
>> SQL). Also, I wonder how other vendors tackle this?
>> 2. New IndexQuery might not mimic ScanQuery behavior but instead have
>> a correct/expected one (including partition reservation). Perhaps
>> IndexQuery should also cover regular entry iteration (scan) cases and
>> become a new ground for generalized scan/index queries.
>>
>> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn :
>> > IndexQuery is experimental, so we can indeed make it reserve partitions
>> by
>> > default. No objections here.
>> >
>> > But with ScanQuery, I think it's better to add a "reservePartitions"
>> > property so that default behavior is not affected.
>> > Existing users may already have some logic in place to handle
>> > inconsistencies. So if we enable reservation by default,
>> > they'll get an unnecessary performance degradation.
>> >
>> > On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin 
>> > wrote:
>> >
>> >> Hi, Ivan, Pavel! Thanks for your responses.
>> >>
>> >> > But is there a real need/benefit from using scan queries over
>> primitive
>> >> SQL queries today
>> >>
>> >> In addition to Pavel's response, I'd like to add about IndexQuery.
>> >> This
>> >> query is in experimental status, but it shows great performance and
>> beats
>> >> SQL for *some* type of queries. For IndexQuery we can leverage on
>> >> understanding that we use only index and apply some optimizations:
>> >> skip
>> >> boundaries check (h2 Select.isConditionMet), apply filtering by inline
>> IO
>> >> instead of extracting fields from BinaryObjects, also we skip some H2
>> >> related stuff. Also it's not implemented yet, but for IndexQuery we
>> >> can
>> >> access data 

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-10 Thread Maksim Timonin
Hi!

> Existing users may already have some logic in place to handle
inconsistencies

Pavel, I'm not aware of such users but your comment makes sense. So, I' OK
with adding an option for ScanQuery. Naming of the option is debatable. The
name "reservePartitions" looks good, but we actually already reserve a
partition when it is explicitly specified in ScanQuery. Then, it can be a
bit misleading in the case of explicitly setting this param to `false` with
a specified partition. But we can just mention it in javadocs, that the
setting affects only full scan.

> AFAIR Scan and SQL queries implementations are totally different. Could
you tell me how Index queries fit there?

IndexQuery scans indexes like SQL does with few optimizations of traversing
BPlusTree, also there are some differences in query processing, also
IndexQuery provides sorted results by default. But I don't expect
inconsistency in results for SQL / IndexQuery for similar queries.
Actually, I should add tests for that and fix failures if any.

> In an ideal world it would be great to have only one API

I think it's possible to teach SQL to switch to cache queries for some
cases, or provide such an opportunity with hints or explicit functions. And
these queries might be parts of an SQL query plan. Or we can go even
deeper, maybe it could be like Apache Spark stages, when we can build our
plan with different types of queries, and the same type provides to
users the opportunity to run only specific stages: DataFrame.sql() or
DataFrame.filter(inline=true/false).

> Perhaps IndexQuery should also cover regular entry iteration cases

It's possible too. IndexQuery provides an opportunity to scan the PK index,
then it can start ScanQuery under the hood for some cases.

But anyway, to make them run through the single API, we should provide the
same guarantees.

Then, if there are no objections, the short-term plan is:
1. Implement partition reservation for IndexQuery.
2. Add the option for ScanQuery.
3. Make tests that show that IndexQuery / SQL / ScanQuery is replaceable
for some types of queries in terms of result consistency.
4. Then discuss again, how we can integrate them together.


On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin  wrote:

> Actually I brought a point about using SQL queries instead of scan
> queries because I worry about inconsistency between different
> implementations. AFAIR Scan and SQL queries implementations are
> totally different. Could you tell me how Index queries fit there?
>
> My general ideas are as follows:
> 1. In an ideal world it would be great to have only one API (to rule
> them all) and implementation to cover all cases (i.e. scan, index,
> SQL). Also, I wonder how other vendors tackle this?
> 2. New IndexQuery might not mimic ScanQuery behavior but instead have
> a correct/expected one (including partition reservation). Perhaps
> IndexQuery should also cover regular entry iteration (scan) cases and
> become a new ground for generalized scan/index queries.
>
> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn :
> > IndexQuery is experimental, so we can indeed make it reserve partitions
> by
> > default. No objections here.
> >
> > But with ScanQuery, I think it's better to add a "reservePartitions"
> > property so that default behavior is not affected.
> > Existing users may already have some logic in place to handle
> > inconsistencies. So if we enable reservation by default,
> > they'll get an unnecessary performance degradation.
> >
> > On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin 
> > wrote:
> >
> >> Hi, Ivan, Pavel! Thanks for your responses.
> >>
> >> > But is there a real need/benefit from using scan queries over
> primitive
> >> SQL queries today
> >>
> >> In addition to Pavel's response, I'd like to add about IndexQuery. This
> >> query is in experimental status, but it shows great performance and
> beats
> >> SQL for *some* type of queries. For IndexQuery we can leverage on
> >> understanding that we use only index and apply some optimizations: skip
> >> boundaries check (h2 Select.isConditionMet), apply filtering by inline
> IO
> >> instead of extracting fields from BinaryObjects, also we skip some H2
> >> related stuff. Also it's not implemented yet, but for IndexQuery we can
> >> access data pages sequentially instead of randomly (by iterating with
> >> IndexRow cursor by batches), and I expect it may make performance better
> >> in
> >> some highload cases.
> >>
> >> > it is desirable that successful (but may be strange) scenarios
> continue
> >> to work after fix (and not fail e.g. due to some introduced exception)
> >>
> >> > Partition reservation should be opt-in if we decide to proceed.
> >>
> >> With the approach I proposed we can reach it. Without specified query
> >> timeout (and cache queries don't have a public API for setting timeout),
> >> we
> >> can retry our attempts to reserve partitions while we do not succeed.
> So,
> >> users may get some increase in query time execution under unstable

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-10 Thread Ivan Pavlukhin
Actually I brought a point about using SQL queries instead of scan
queries because I worry about inconsistency between different
implementations. AFAIR Scan and SQL queries implementations are
totally different. Could you tell me how Index queries fit there?

My general ideas are as follows:
1. In an ideal world it would be great to have only one API (to rule
them all) and implementation to cover all cases (i.e. scan, index,
SQL). Also, I wonder how other vendors tackle this?
2. New IndexQuery might not mimic ScanQuery behavior but instead have
a correct/expected one (including partition reservation). Perhaps
IndexQuery should also cover regular entry iteration (scan) cases and
become a new ground for generalized scan/index queries.

2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn :
> IndexQuery is experimental, so we can indeed make it reserve partitions by
> default. No objections here.
>
> But with ScanQuery, I think it's better to add a "reservePartitions"
> property so that default behavior is not affected.
> Existing users may already have some logic in place to handle
> inconsistencies. So if we enable reservation by default,
> they'll get an unnecessary performance degradation.
>
> On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin 
> wrote:
>
>> Hi, Ivan, Pavel! Thanks for your responses.
>>
>> > But is there a real need/benefit from using scan queries over primitive
>> SQL queries today
>>
>> In addition to Pavel's response, I'd like to add about IndexQuery. This
>> query is in experimental status, but it shows great performance and beats
>> SQL for *some* type of queries. For IndexQuery we can leverage on
>> understanding that we use only index and apply some optimizations: skip
>> boundaries check (h2 Select.isConditionMet), apply filtering by inline IO
>> instead of extracting fields from BinaryObjects, also we skip some H2
>> related stuff. Also it's not implemented yet, but for IndexQuery we can
>> access data pages sequentially instead of randomly (by iterating with
>> IndexRow cursor by batches), and I expect it may make performance better
>> in
>> some highload cases.
>>
>> > it is desirable that successful (but may be strange) scenarios continue
>> to work after fix (and not fail e.g. due to some introduced exception)
>>
>> > Partition reservation should be opt-in if we decide to proceed.
>>
>> With the approach I proposed we can reach it. Without specified query
>> timeout (and cache queries don't have a public API for setting timeout),
>> we
>> can retry our attempts to reserve partitions while we do not succeed. So,
>> users may get some increase in query time execution under unstable
>> topology. But in exchange users will get more stable results, and avoid
>> exceptions like in the ticket I showed [1].
>>
>> We can implement this logic for IndexQuery only (as it's only
>> experimental
>> since 2.12), and extend it on ScanQuery later after stabilizing with
>> IndexQuery. WDYT?
>>
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-12591
>>
>> On Thu, Dec 9, 2021 at 10:39 AM Pavel Tupitsyn 
>> wrote:
>>
>> > Agree with Ivan regarding compatibility.
>> > Partition reservation should be opt-in if we decide to proceed.
>> >
>> > > is there a real need/benefit from using
>> > > scan queries over primitive SQL queries today
>> >
>> > SQL requires additional configuration (QueryEntity) and involves memory
>> and
>> > CPU overhead to maintain the indexes.
>> > Especially if the query is used rarely (maybe some maintenance tasks),
>> scan
>> > is a better option.
>> >
>> > On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin 
>> > wrote:
>> >
>> > > Hi Maksim,
>> > >
>> > > Thank you for looking into this. While fixing wrong/surprising
>> > > behavior is very important, I also have some concerns, let's say,
>> > > from
>> > > different angle of view:
>> > > 1. From a first glance it seems that similar behavior of scan and SQL
>> > > queries is a good idea. But is there a real need/benefit from using
>> > > scan queries over primitive SQL queries today (e.g. SELECT * FROM
>> > > table1)?
>> > > 2. Also we need to carefully think about compatibility. It is
>> > > desirable that successful (but may be strange) scenarios continue to
>> > > work after fix (and not fail e.g. due to some introduced exception).
>> > >
>> > > Regarding partition reservation for long-time open cursors I guess
>> > > some ideas might be found in LAZY mode for SQL queries [1].
>> > >
>> > > [1] https://issues.apache.org/jira/browse/IGNITE-9171
>> > >
>> > > 2021-12-08 20:11 GMT+03:00, Maksim Timonin :
>> > > > Hi, Igniters!
>> > > >
>> > > > There is a known issue that ScanQuery on unstable topology returns
>> > > > incorrect results: duplicates data [1] or fails with an exception
>> [2].
>> > > The
>> > > > reason is ScanQuery doesn't reserve partitions.
>> > > >
>> > > > IndexQuery shares the same query processing as ScanQuery, and then
>> > > > it
>> > is
>> > > > also affected by unstable topology. I want to fix it for
>> > > > 

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-09 Thread Pavel Tupitsyn
IndexQuery is experimental, so we can indeed make it reserve partitions by
default. No objections here.

But with ScanQuery, I think it's better to add a "reservePartitions"
property so that default behavior is not affected.
Existing users may already have some logic in place to handle
inconsistencies. So if we enable reservation by default,
they'll get an unnecessary performance degradation.

On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin 
wrote:

> Hi, Ivan, Pavel! Thanks for your responses.
>
> > But is there a real need/benefit from using scan queries over primitive
> SQL queries today
>
> In addition to Pavel's response, I'd like to add about IndexQuery. This
> query is in experimental status, but it shows great performance and beats
> SQL for *some* type of queries. For IndexQuery we can leverage on
> understanding that we use only index and apply some optimizations: skip
> boundaries check (h2 Select.isConditionMet), apply filtering by inline IO
> instead of extracting fields from BinaryObjects, also we skip some H2
> related stuff. Also it's not implemented yet, but for IndexQuery we can
> access data pages sequentially instead of randomly (by iterating with
> IndexRow cursor by batches), and I expect it may make performance better in
> some highload cases.
>
> > it is desirable that successful (but may be strange) scenarios continue
> to work after fix (and not fail e.g. due to some introduced exception)
>
> > Partition reservation should be opt-in if we decide to proceed.
>
> With the approach I proposed we can reach it. Without specified query
> timeout (and cache queries don't have a public API for setting timeout), we
> can retry our attempts to reserve partitions while we do not succeed. So,
> users may get some increase in query time execution under unstable
> topology. But in exchange users will get more stable results, and avoid
> exceptions like in the ticket I showed [1].
>
> We can implement this logic for IndexQuery only (as it's only experimental
> since 2.12), and extend it on ScanQuery later after stabilizing with
> IndexQuery. WDYT?
>
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12591
>
> On Thu, Dec 9, 2021 at 10:39 AM Pavel Tupitsyn 
> wrote:
>
> > Agree with Ivan regarding compatibility.
> > Partition reservation should be opt-in if we decide to proceed.
> >
> > > is there a real need/benefit from using
> > > scan queries over primitive SQL queries today
> >
> > SQL requires additional configuration (QueryEntity) and involves memory
> and
> > CPU overhead to maintain the indexes.
> > Especially if the query is used rarely (maybe some maintenance tasks),
> scan
> > is a better option.
> >
> > On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin 
> > wrote:
> >
> > > Hi Maksim,
> > >
> > > Thank you for looking into this. While fixing wrong/surprising
> > > behavior is very important, I also have some concerns, let's say, from
> > > different angle of view:
> > > 1. From a first glance it seems that similar behavior of scan and SQL
> > > queries is a good idea. But is there a real need/benefit from using
> > > scan queries over primitive SQL queries today (e.g. SELECT * FROM
> > > table1)?
> > > 2. Also we need to carefully think about compatibility. It is
> > > desirable that successful (but may be strange) scenarios continue to
> > > work after fix (and not fail e.g. due to some introduced exception).
> > >
> > > Regarding partition reservation for long-time open cursors I guess
> > > some ideas might be found in LAZY mode for SQL queries [1].
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-9171
> > >
> > > 2021-12-08 20:11 GMT+03:00, Maksim Timonin :
> > > > Hi, Igniters!
> > > >
> > > > There is a known issue that ScanQuery on unstable topology returns
> > > > incorrect results: duplicates data [1] or fails with an exception
> [2].
> > > The
> > > > reason is ScanQuery doesn't reserve partitions.
> > > >
> > > > IndexQuery shares the same query processing as ScanQuery, and then it
> > is
> > > > also affected by unstable topology. I want to fix it for IndexQuery.
> > > > IndexQuery should provide the same stability as SQL queries do - no
> > > > occasional failures or data duplication.
> > > >
> > > > I dived into the SQL query processing and found that we can unify
> logic
> > > for
> > > > SQL queries and cache queries (ScanQuery, IndexQuery, TextQuery):
> > > >
> > > > 1. Currently, cache queries use `GridCacheQueryAdapter#nodes` to find
> > > nodes
> > > > to run a query. From the other side, SQL processing uses for the same
> > > goal
> > > > `ReducePartitionMapper#nodesForPartitions`. It looks like those
> methods
> > > > have some slight differences, but in the common, they do the same.
> So,
> > I
> > > > propose to make a single method for both cases.
> > > >
> > > > 2. SQL processing uses `PartitionReservationManager` for reserve
> > > partitions
> > > > on the map side. I propose to move it to the ignite-core module and
> > start
> > > > using it for 

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-09 Thread Maksim Timonin
Hi, Ivan, Pavel! Thanks for your responses.

> But is there a real need/benefit from using scan queries over primitive
SQL queries today

In addition to Pavel's response, I'd like to add about IndexQuery. This
query is in experimental status, but it shows great performance and beats
SQL for *some* type of queries. For IndexQuery we can leverage on
understanding that we use only index and apply some optimizations: skip
boundaries check (h2 Select.isConditionMet), apply filtering by inline IO
instead of extracting fields from BinaryObjects, also we skip some H2
related stuff. Also it's not implemented yet, but for IndexQuery we can
access data pages sequentially instead of randomly (by iterating with
IndexRow cursor by batches), and I expect it may make performance better in
some highload cases.

> it is desirable that successful (but may be strange) scenarios continue
to work after fix (and not fail e.g. due to some introduced exception)

> Partition reservation should be opt-in if we decide to proceed.

With the approach I proposed we can reach it. Without specified query
timeout (and cache queries don't have a public API for setting timeout), we
can retry our attempts to reserve partitions while we do not succeed. So,
users may get some increase in query time execution under unstable
topology. But in exchange users will get more stable results, and avoid
exceptions like in the ticket I showed [1].

We can implement this logic for IndexQuery only (as it's only experimental
since 2.12), and extend it on ScanQuery later after stabilizing with
IndexQuery. WDYT?


[1] https://issues.apache.org/jira/browse/IGNITE-12591

On Thu, Dec 9, 2021 at 10:39 AM Pavel Tupitsyn  wrote:

> Agree with Ivan regarding compatibility.
> Partition reservation should be opt-in if we decide to proceed.
>
> > is there a real need/benefit from using
> > scan queries over primitive SQL queries today
>
> SQL requires additional configuration (QueryEntity) and involves memory and
> CPU overhead to maintain the indexes.
> Especially if the query is used rarely (maybe some maintenance tasks), scan
> is a better option.
>
> On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin 
> wrote:
>
> > Hi Maksim,
> >
> > Thank you for looking into this. While fixing wrong/surprising
> > behavior is very important, I also have some concerns, let's say, from
> > different angle of view:
> > 1. From a first glance it seems that similar behavior of scan and SQL
> > queries is a good idea. But is there a real need/benefit from using
> > scan queries over primitive SQL queries today (e.g. SELECT * FROM
> > table1)?
> > 2. Also we need to carefully think about compatibility. It is
> > desirable that successful (but may be strange) scenarios continue to
> > work after fix (and not fail e.g. due to some introduced exception).
> >
> > Regarding partition reservation for long-time open cursors I guess
> > some ideas might be found in LAZY mode for SQL queries [1].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-9171
> >
> > 2021-12-08 20:11 GMT+03:00, Maksim Timonin :
> > > Hi, Igniters!
> > >
> > > There is a known issue that ScanQuery on unstable topology returns
> > > incorrect results: duplicates data [1] or fails with an exception [2].
> > The
> > > reason is ScanQuery doesn't reserve partitions.
> > >
> > > IndexQuery shares the same query processing as ScanQuery, and then it
> is
> > > also affected by unstable topology. I want to fix it for IndexQuery.
> > > IndexQuery should provide the same stability as SQL queries do - no
> > > occasional failures or data duplication.
> > >
> > > I dived into the SQL query processing and found that we can unify logic
> > for
> > > SQL queries and cache queries (ScanQuery, IndexQuery, TextQuery):
> > >
> > > 1. Currently, cache queries use `GridCacheQueryAdapter#nodes` to find
> > nodes
> > > to run a query. From the other side, SQL processing uses for the same
> > goal
> > > `ReducePartitionMapper#nodesForPartitions`. It looks like those methods
> > > have some slight differences, but in the common, they do the same. So,
> I
> > > propose to make a single method for both cases.
> > >
> > > 2. SQL processing uses `PartitionReservationManager` for reserve
> > partitions
> > > on the map side. I propose to move it to the ignite-core module and
> start
> > > using it for the cache queries.
> > >
> > > 3. Implement retries for the cache queries in case we failed to reserve
> > > partitions on the map side.
> > >
> > > Currently, I see a downside of reserving partitions for cache queries:
> > > cache queries are lazy. And the time of partition reservation depends
> on
> > a
> > > user's application code (how fast a cursor is iterated and closed).
> > AFAIU,
> > > it's not very good to have a partition in reserve too long. Please,
> > correct
> > > me if I'm wrong here.
> > >
> > > But from the other side, Ignite reserves partitions for ScanQuery when
> a
> > > partition has been specified as a ScanQuery parameter, and 

Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-08 Thread Pavel Tupitsyn
Agree with Ivan regarding compatibility.
Partition reservation should be opt-in if we decide to proceed.

> is there a real need/benefit from using
> scan queries over primitive SQL queries today

SQL requires additional configuration (QueryEntity) and involves memory and
CPU overhead to maintain the indexes.
Especially if the query is used rarely (maybe some maintenance tasks), scan
is a better option.

On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin  wrote:

> Hi Maksim,
>
> Thank you for looking into this. While fixing wrong/surprising
> behavior is very important, I also have some concerns, let's say, from
> different angle of view:
> 1. From a first glance it seems that similar behavior of scan and SQL
> queries is a good idea. But is there a real need/benefit from using
> scan queries over primitive SQL queries today (e.g. SELECT * FROM
> table1)?
> 2. Also we need to carefully think about compatibility. It is
> desirable that successful (but may be strange) scenarios continue to
> work after fix (and not fail e.g. due to some introduced exception).
>
> Regarding partition reservation for long-time open cursors I guess
> some ideas might be found in LAZY mode for SQL queries [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-9171
>
> 2021-12-08 20:11 GMT+03:00, Maksim Timonin :
> > Hi, Igniters!
> >
> > There is a known issue that ScanQuery on unstable topology returns
> > incorrect results: duplicates data [1] or fails with an exception [2].
> The
> > reason is ScanQuery doesn't reserve partitions.
> >
> > IndexQuery shares the same query processing as ScanQuery, and then it is
> > also affected by unstable topology. I want to fix it for IndexQuery.
> > IndexQuery should provide the same stability as SQL queries do - no
> > occasional failures or data duplication.
> >
> > I dived into the SQL query processing and found that we can unify logic
> for
> > SQL queries and cache queries (ScanQuery, IndexQuery, TextQuery):
> >
> > 1. Currently, cache queries use `GridCacheQueryAdapter#nodes` to find
> nodes
> > to run a query. From the other side, SQL processing uses for the same
> goal
> > `ReducePartitionMapper#nodesForPartitions`. It looks like those methods
> > have some slight differences, but in the common, they do the same. So, I
> > propose to make a single method for both cases.
> >
> > 2. SQL processing uses `PartitionReservationManager` for reserve
> partitions
> > on the map side. I propose to move it to the ignite-core module and start
> > using it for the cache queries.
> >
> > 3. Implement retries for the cache queries in case we failed to reserve
> > partitions on the map side.
> >
> > Currently, I see a downside of reserving partitions for cache queries:
> > cache queries are lazy. And the time of partition reservation depends on
> a
> > user's application code (how fast a cursor is iterated and closed).
> AFAIU,
> > it's not very good to have a partition in reserve too long. Please,
> correct
> > me if I'm wrong here.
> >
> > But from the other side, Ignite reserves partitions for ScanQuery when a
> > partition has been specified as a ScanQuery parameter, and Ignite
> reserves
> > partitions for SQL with the flag lazy=true. Also:
> > - IndexQuery: I expect simple queries that will return a relatively small
> > amount of data. Then partitions wouldn't be reserved too much time.
> > - the same is for TextQuery - it returns a limited amount of data (due to
> > the Lucene logic).
> > - full ScanQuery it's not in use much, AFAIK. So, it's by default a
> pretty
> > heavy operation.
> >
> > So, I think it's safe to reserve partitions in any case. But there could
> be
> > an alternative optimistic approach, smth like that:
> > 1. Check that topology is stable and waiting while it's not stabilized.
> > 2. Run a query on a stable cluster.
> > 3. In cases when a cluster becomes unstable during query execution - try
> to
> > reserve partitions at runtime (query should be aware of topology changes)
> > and fail in case of reservation failure (if a user already fetched some
> > data).
> >
> > I don't like the idea of this optimistic approach because in case a user
> > got some data, we don't have a better solution than to fail a query in
> case
> > of cluster instability and reservation failure.
> >
> > Igniters, WDYT?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12591
> > [2] https://issues.apache.org/jira/browse/IGNITE-16031
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>


Re: [DISCUSSION] Reserve partitions for CacheQueries

2021-12-08 Thread Ivan Pavlukhin
Hi Maksim,

Thank you for looking into this. While fixing wrong/surprising
behavior is very important, I also have some concerns, let's say, from
different angle of view:
1. From a first glance it seems that similar behavior of scan and SQL
queries is a good idea. But is there a real need/benefit from using
scan queries over primitive SQL queries today (e.g. SELECT * FROM
table1)?
2. Also we need to carefully think about compatibility. It is
desirable that successful (but may be strange) scenarios continue to
work after fix (and not fail e.g. due to some introduced exception).

Regarding partition reservation for long-time open cursors I guess
some ideas might be found in LAZY mode for SQL queries [1].

[1] https://issues.apache.org/jira/browse/IGNITE-9171

2021-12-08 20:11 GMT+03:00, Maksim Timonin :
> Hi, Igniters!
>
> There is a known issue that ScanQuery on unstable topology returns
> incorrect results: duplicates data [1] or fails with an exception [2]. The
> reason is ScanQuery doesn't reserve partitions.
>
> IndexQuery shares the same query processing as ScanQuery, and then it is
> also affected by unstable topology. I want to fix it for IndexQuery.
> IndexQuery should provide the same stability as SQL queries do - no
> occasional failures or data duplication.
>
> I dived into the SQL query processing and found that we can unify logic for
> SQL queries and cache queries (ScanQuery, IndexQuery, TextQuery):
>
> 1. Currently, cache queries use `GridCacheQueryAdapter#nodes` to find nodes
> to run a query. From the other side, SQL processing uses for the same goal
> `ReducePartitionMapper#nodesForPartitions`. It looks like those methods
> have some slight differences, but in the common, they do the same. So, I
> propose to make a single method for both cases.
>
> 2. SQL processing uses `PartitionReservationManager` for reserve partitions
> on the map side. I propose to move it to the ignite-core module and start
> using it for the cache queries.
>
> 3. Implement retries for the cache queries in case we failed to reserve
> partitions on the map side.
>
> Currently, I see a downside of reserving partitions for cache queries:
> cache queries are lazy. And the time of partition reservation depends on a
> user's application code (how fast a cursor is iterated and closed). AFAIU,
> it's not very good to have a partition in reserve too long. Please, correct
> me if I'm wrong here.
>
> But from the other side, Ignite reserves partitions for ScanQuery when a
> partition has been specified as a ScanQuery parameter, and Ignite reserves
> partitions for SQL with the flag lazy=true. Also:
> - IndexQuery: I expect simple queries that will return a relatively small
> amount of data. Then partitions wouldn't be reserved too much time.
> - the same is for TextQuery - it returns a limited amount of data (due to
> the Lucene logic).
> - full ScanQuery it's not in use much, AFAIK. So, it's by default a pretty
> heavy operation.
>
> So, I think it's safe to reserve partitions in any case. But there could be
> an alternative optimistic approach, smth like that:
> 1. Check that topology is stable and waiting while it's not stabilized.
> 2. Run a query on a stable cluster.
> 3. In cases when a cluster becomes unstable during query execution - try to
> reserve partitions at runtime (query should be aware of topology changes)
> and fail in case of reservation failure (if a user already fetched some
> data).
>
> I don't like the idea of this optimistic approach because in case a user
> got some data, we don't have a better solution than to fail a query in case
> of cluster instability and reservation failure.
>
> Igniters, WDYT?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12591
> [2] https://issues.apache.org/jira/browse/IGNITE-16031
>


-- 

Best regards,
Ivan Pavlukhin


[DISCUSSION] Reserve partitions for CacheQueries

2021-12-08 Thread Maksim Timonin
Hi, Igniters!

There is a known issue that ScanQuery on unstable topology returns
incorrect results: duplicates data [1] or fails with an exception [2]. The
reason is ScanQuery doesn't reserve partitions.

IndexQuery shares the same query processing as ScanQuery, and then it is
also affected by unstable topology. I want to fix it for IndexQuery.
IndexQuery should provide the same stability as SQL queries do - no
occasional failures or data duplication.

I dived into the SQL query processing and found that we can unify logic for
SQL queries and cache queries (ScanQuery, IndexQuery, TextQuery):

1. Currently, cache queries use `GridCacheQueryAdapter#nodes` to find nodes
to run a query. From the other side, SQL processing uses for the same goal
`ReducePartitionMapper#nodesForPartitions`. It looks like those methods
have some slight differences, but in the common, they do the same. So, I
propose to make a single method for both cases.

2. SQL processing uses `PartitionReservationManager` for reserve partitions
on the map side. I propose to move it to the ignite-core module and start
using it for the cache queries.

3. Implement retries for the cache queries in case we failed to reserve
partitions on the map side.

Currently, I see a downside of reserving partitions for cache queries:
cache queries are lazy. And the time of partition reservation depends on a
user's application code (how fast a cursor is iterated and closed). AFAIU,
it's not very good to have a partition in reserve too long. Please, correct
me if I'm wrong here.

But from the other side, Ignite reserves partitions for ScanQuery when a
partition has been specified as a ScanQuery parameter, and Ignite reserves
partitions for SQL with the flag lazy=true. Also:
- IndexQuery: I expect simple queries that will return a relatively small
amount of data. Then partitions wouldn't be reserved too much time.
- the same is for TextQuery - it returns a limited amount of data (due to
the Lucene logic).
- full ScanQuery it's not in use much, AFAIK. So, it's by default a pretty
heavy operation.

So, I think it's safe to reserve partitions in any case. But there could be
an alternative optimistic approach, smth like that:
1. Check that topology is stable and waiting while it's not stabilized.
2. Run a query on a stable cluster.
3. In cases when a cluster becomes unstable during query execution - try to
reserve partitions at runtime (query should be aware of topology changes)
and fail in case of reservation failure (if a user already fetched some
data).

I don't like the idea of this optimistic approach because in case a user
got some data, we don't have a better solution than to fail a query in case
of cluster instability and reservation failure.

Igniters, WDYT?

[1] https://issues.apache.org/jira/browse/IGNITE-12591
[2] https://issues.apache.org/jira/browse/IGNITE-16031