Re: IgniteOutOfMemoryException in LOCAL cache mode with persistence enabled

2019-12-11 Thread Sergey Chugunov
Hi Mitchell,

I believe that research done by Anton is correct, and the root cause of the
OOME is proportion of memory occupied by metapages in data region. Each
cache started in data region allocates one or more metapages per
initialized partition so when you run your test with only one cache this is
not a problem, but when second cache is added this results in OOME.

I don't think there is an easy way to prevent this exception in general but
I agree that we need to provide more descriptive error message and/or early
warning for the user that configuration of caches and data regions may lead
to such exception. I'll file a ticket for this improvement soon.

Best regards,
Sergey

On Thu, Dec 12, 2019 at 1:27 AM Denis Magda  wrote:

> I tend to agree with Mitchell that the cluster should not crash. If the
> crash is unavoidable based on the current architecture then a message
> should be more descriptive.
>
> Ignite persistence experts, could you please join the conversation and
> shed more light to the reported behavior?
>
> -
> Denis
>
>
> On Wed, Dec 11, 2019 at 3:25 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
> mrathb...@bloomberg.net> wrote:
>
>> 2 GB is not reasonable for off heap memory for our use case. In general,
>> even if off-heap is very low, performance should just degrade and calls
>> should become blocking, I don't think that we should crash. Either way, the
>> issue seems to be with putAll, not concurrent updates of different caches
>> in the same data region. If I use Ignite's DataStreamer API instead of
>> putAll, I get much better performance and no OOM exception. Any insight
>> into why this might be would be appreciated.
>>
>> From: u...@ignite.apache.org At: 12/10/19 11:24:35
>> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) ,
>> u...@ignite.apache.org
>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with
>> persistence enabled
>>
>> Hello!
>>
>> 10M is very very low-ball for testing performance of disk, considering
>> how Ignite's wal/checkpoints are structured. As already told, it does not
>> even work properly.
>>
>> I recommend using 2G value instead. Just load enough data so that you can
>> observe constant checkpoints.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 4 дек. 2019 г. в 03:16, Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
>> mrathb...@bloomberg.net>:
>>
>>> For the requested full ignite log, where would this be found if we are
>>> running using local mode? We are not explicitly running a separate ignite
>>> node, and our WorkDirectory does not seem to have any logs
>>>
>>> From: u...@ignite.apache.org At: 12/03/19 19:00:18
>>> To: u...@ignite.apache.org
>>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with
>>> persistence enabled
>>>
>>> For our configuration properties, our DataRegion initialSize and MaxSize
>>> was set to 11 MB and persistence was enabled. For DataStorage, our pageSize
>>> was set to 8192 instead of 4096. For Cache, write behind is disabled, on
>>> heap cache is disabled, and Atomicity Mode is Atomic
>>>
>>> From: u...@ignite.apache.org At: 12/03/19 13:40:32
>>> To: u...@ignite.apache.org
>>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with
>>> persistence enabled
>>>
>>> Hi Mitchell,
>>>
>>> Looks like it could be easily reproduced on low off-heap sizes, I tried
>>> with
>>> simple puts and got the same exception:
>>>
>>> class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed
>>> to
>>> find a page for eviction [segmentCapacity=1580, loaded=619,
>>> maxDirtyPages=465, dirtyPages=619, cpPages=0, pinnedInSegment=0,
>>> failedToPrepare=620]
>>> Out of memory in data region [name=Default_Region, initSize=10.0 MiB,
>>> maxSize=10.0 MiB, persistenceEnabled=true] Try the following:
>>> ^-- Increase maximum off-heap memory size
>>> (DataRegionConfiguration.maxSize)
>>> ^-- Enable Ignite persistence
>>> (DataRegionConfiguration.persistenceEnabled)
>>> ^-- Enable eviction or expiration policies
>>>
>>> It looks like Ignite must issue a proper warning in this case and couple
>>> of
>>> issues must be filed against Ignite JIRA.
>>>
>>> Check out this article on persistent store available in Ignite
>>> confluence as
>>> well:
>>>
>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+und
>>> er+the+hood#IgnitePersistentStore-underthehood-Checkpointing
>>>
>>> I've managed to make kind of similar example working with 20 Mb region
>>> with
>>> a bit of tuning, added following properties to
>>> org.apache.ignite.configuration.DataStorageConfiguration:
>>> /
>>> /
>>>
>>> The whole idea behind this is to trigger checkpoint on timeout rather
>>> than
>>> on too much dirty pages percentage threshold. The checkpoint page buffer
>>> size may not exceed data region size, which is 10 Mb, which might be
>>> overflown during checkpoint as well.
>>>
>>> I assume that checkpoint is never triggered in this case because of
>>> per-partition overhead: Ignite writes some meta per partition and it

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

2019-12-11 Thread Vladimir Ozerov
Roman,

What I am trying to understand is what advantage of materialization API you
see over the normal optimization process? Does it save optimization time,
or reduce memory footprint, or maybe provide better plans? I am asking
because I do not see how expressing indexes as materializations fit
classical optimization process. We discussed Sort <- Scan optimization.
Let's consider another example:

LogicalSort[a ASC]
  LogicalJoin

Initially, you do not know the implementation of the join, and hence do not
know it's collation. Then you may execute physical join rules, which
produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation
rule afterwards, you may easily eliminate the sort, or make it simpler
(e.g. remove local sorting phase), depending on the distribution. In other
words, proper implementation of sorting optimization assumes that you have
a kind of SortRemoveRule anyway, irrespectively of whether you use
materializations or not, because sorting may be injected on top of any
operator. With this in mind, the use of materializations doesn't make the
planner simpler. Neither it improves the outcome of the whole optimization
process.

What is left is either lower CPU or RAM usage? Is this the case?

ср, 11 дек. 2019 г. в 18:37, Roman Kondakov :

> Vladimir,
>
> the main advantage of the Phoenix approach I can see is the using of
> Calcite's native materializations API. Calcite has advanced support for
> materializations [1] and lattices [2]. Since secondary indexes can be
> considered as materialized views (it's just a sorted representation of
> the same table) we can seamlessly use views to simulate indexes behavior
> for Calcite planner.
>
>
> [1] https://calcite.apache.org/docs/materialized_views.html
> [2] https://calcite.apache.org/docs/lattice.html
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 11.12.2019 17:11, Vladimir Ozerov wrote:
> > Roman,
> >
> > What is the advantage of Phoenix approach then? BTW, it looks like
> Phoenix
> > integration with Calcite never made it to production, did it?
> >
> > вт, 10 дек. 2019 г. в 19:50, Roman Kondakov  >:
> >
> >> Hi Vladimir,
> >>
> >> from what I understand, Drill does not exploit collation of indexes. To
> >> be precise it does not exploit index collation in "natural" way where,
> >> say, we a have sorted TableScan and hence we do not create a new Sort.
> >> Instead of it Drill always create a Sort operator, but if TableScan can
> >> be replaced with an IndexScan, this Sort operator is removed by the
> >> dedicated rule.
> >>
> >> Lets consider initial an operator tree:
> >>
> >> Project
> >>   Sort
> >> TableScan
> >>
> >> after applying rule DbScanToIndexScanPrule this tree will be converted
> to:
> >>
> >> Project
> >>   Sort
> >> IndexScan
> >>
> >> and finally, after applying DbScanSortRemovalRule we have:
> >>
> >> Project
> >>   IndexScan
> >>
> >> while for Phoenix approach we would have two equivalent subsets in our
> >> planner:
> >>
> >> Project
> >>   Sort
> >> TableScan
> >>
> >> and
> >>
> >> Project
> >>   IndexScan
> >>
> >> and most likely the last plan  will be chosen as the best one.
> >>
> >> --
> >> Kind Regards
> >> Roman Kondakov
> >>
> >>
> >> On 10.12.2019 17:19, Vladimir Ozerov wrote:
> >>> Hi Roman,
> >>>
> >>> Why do you think that Drill-style will not let you exploit collation?
> >>> Collation should be propagated from the index scan in the same way as
> in
> >>> other sorted operators, such as merge join or streaming aggregate.
> >> Provided
> >>> that you use converter-hack (or any alternative solution to trigger
> >> parent
> >>> re-analysis).
> >>> In other words, propagation of collation from Drill-style indexes
> should
> >> be
> >>> no different from other sorted operators.
> >>>
> >>> Regards,
> >>> Vladimir.
> >>>
> >>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> >>   :
> >>>
> 
>  Roman just as fast remark, Phoenix builds their approach on
>  already existing monolith HBase architecture, most cases it`s just a
> >> stub
>  for someone who wants use secondary indexes with a base with no
>  native support of it. Don`t think it`s good idea here.
> 
> >
> >
> > --- Forwarded message ---
> > From: "Roman Kondakov" < kondako...@mail.ru.invalid >
> > To:  dev@ignite.apache.org
> > Cc:
> > Subject: Adding support for Ignite secondary indexes to Apache
> Calcite
> > planner
> > Date: Tue, 10 Dec 2019 15:55:52 +0300
> >
> > Hi all!
> >
> > As you may know there is an activity on integration of Apache Calcite
> > query optimizer into Ignite codebase is being carried out [1],[2].
> >
> > One of a bunch of problems in this integration is the absence of
> > out-of-the-box support for secondary indexes in Apache Calcite. After
> > some research I came to conclusion that this problem has a couple of
> > workarounds. Let's name them
> > 1. Phoenix-style approach - representing 

Re: IgniteOutOfMemoryException in LOCAL cache mode with persistence enabled

2019-12-11 Thread Denis Magda
I tend to agree with Mitchell that the cluster should not crash. If the
crash is unavoidable based on the current architecture then a message
should be more descriptive.

Ignite persistence experts, could you please join the conversation and shed
more light to the reported behavior?

-
Denis


On Wed, Dec 11, 2019 at 3:25 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
mrathb...@bloomberg.net> wrote:

> 2 GB is not reasonable for off heap memory for our use case. In general,
> even if off-heap is very low, performance should just degrade and calls
> should become blocking, I don't think that we should crash. Either way, the
> issue seems to be with putAll, not concurrent updates of different caches
> in the same data region. If I use Ignite's DataStreamer API instead of
> putAll, I get much better performance and no OOM exception. Any insight
> into why this might be would be appreciated.
>
> From: u...@ignite.apache.org At: 12/10/19 11:24:35
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) ,
> u...@ignite.apache.org
> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with
> persistence enabled
>
> Hello!
>
> 10M is very very low-ball for testing performance of disk, considering how
> Ignite's wal/checkpoints are structured. As already told, it does not even
> work properly.
>
> I recommend using 2G value instead. Just load enough data so that you can
> observe constant checkpoints.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 4 дек. 2019 г. в 03:16, Mitchell Rathbun (BLOOMBERG/ 731 LEX) <
> mrathb...@bloomberg.net>:
>
>> For the requested full ignite log, where would this be found if we are
>> running using local mode? We are not explicitly running a separate ignite
>> node, and our WorkDirectory does not seem to have any logs
>>
>> From: u...@ignite.apache.org At: 12/03/19 19:00:18
>> To: u...@ignite.apache.org
>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with
>> persistence enabled
>>
>> For our configuration properties, our DataRegion initialSize and MaxSize
>> was set to 11 MB and persistence was enabled. For DataStorage, our pageSize
>> was set to 8192 instead of 4096. For Cache, write behind is disabled, on
>> heap cache is disabled, and Atomicity Mode is Atomic
>>
>> From: u...@ignite.apache.org At: 12/03/19 13:40:32
>> To: u...@ignite.apache.org
>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with
>> persistence enabled
>>
>> Hi Mitchell,
>>
>> Looks like it could be easily reproduced on low off-heap sizes, I tried
>> with
>> simple puts and got the same exception:
>>
>> class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
>> find a page for eviction [segmentCapacity=1580, loaded=619,
>> maxDirtyPages=465, dirtyPages=619, cpPages=0, pinnedInSegment=0,
>> failedToPrepare=620]
>> Out of memory in data region [name=Default_Region, initSize=10.0 MiB,
>> maxSize=10.0 MiB, persistenceEnabled=true] Try the following:
>> ^-- Increase maximum off-heap memory size
>> (DataRegionConfiguration.maxSize)
>> ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>> ^-- Enable eviction or expiration policies
>>
>> It looks like Ignite must issue a proper warning in this case and couple
>> of
>> issues must be filed against Ignite JIRA.
>>
>> Check out this article on persistent store available in Ignite confluence
>> as
>> well:
>>
>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+und
>> er+the+hood#IgnitePersistentStore-underthehood-Checkpointing
>>
>> I've managed to make kind of similar example working with 20 Mb region
>> with
>> a bit of tuning, added following properties to
>> org.apache.ignite.configuration.DataStorageConfiguration:
>> /
>> /
>>
>> The whole idea behind this is to trigger checkpoint on timeout rather than
>> on too much dirty pages percentage threshold. The checkpoint page buffer
>> size may not exceed data region size, which is 10 Mb, which might be
>> overflown during checkpoint as well.
>>
>> I assume that checkpoint is never triggered in this case because of
>> per-partition overhead: Ignite writes some meta per partition and it looks
>> like that it is at least 1 meta page utilized for each which results in
>> some
>> amount of off-heap devoured by these meta pages. In the case with the
>> lowest
>> possible region size, this might consume more than 3 Mb for cache with 1k
>> partitions and 70% dirty data pages threshold would never be reached.
>>
>> However, I found another issue when it is not possible to save meta page
>> on
>> checkpoint begin, this reproduces on 10 Mb data region with mentioned
>> storage configuration options.
>>
>> Could you please describe the configuration if you have anything different
>> from defaults (page size, wal mode, partitions count) and types of
>> key/value
>> that you use? And if it is possible, could you please attach full Ignite
>> log
>> from the node that has suffered from IOOM?
>>
>> As for the data region/cache, in reality you do also 

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-12-11 Thread Yuriy Shuliga
I will look to the MOVING partition issue.
But also need a guidance there. 

Ivan, don't you mind to be that person?

The question is whether we have an issue with:
-  wrong storing targets during indexing OR 
- incorrect nodes/partition selection during querying?

BR,
Yuriy Shluiha



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/


Re: Adding support for Ignite secondary indexes to Apache Calcite planner

2019-12-11 Thread Roman Kondakov
Vladimir,

the main advantage of the Phoenix approach I can see is the using of
Calcite's native materializations API. Calcite has advanced support for
materializations [1] and lattices [2]. Since secondary indexes can be
considered as materialized views (it's just a sorted representation of
the same table) we can seamlessly use views to simulate indexes behavior
for Calcite planner.


[1] https://calcite.apache.org/docs/materialized_views.html
[2] https://calcite.apache.org/docs/lattice.html

-- 
Kind Regards
Roman Kondakov


On 11.12.2019 17:11, Vladimir Ozerov wrote:
> Roman,
> 
> What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
> integration with Calcite never made it to production, did it?
> 
> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov :
> 
>> Hi Vladimir,
>>
>> from what I understand, Drill does not exploit collation of indexes. To
>> be precise it does not exploit index collation in "natural" way where,
>> say, we a have sorted TableScan and hence we do not create a new Sort.
>> Instead of it Drill always create a Sort operator, but if TableScan can
>> be replaced with an IndexScan, this Sort operator is removed by the
>> dedicated rule.
>>
>> Lets consider initial an operator tree:
>>
>> Project
>>   Sort
>> TableScan
>>
>> after applying rule DbScanToIndexScanPrule this tree will be converted to:
>>
>> Project
>>   Sort
>> IndexScan
>>
>> and finally, after applying DbScanSortRemovalRule we have:
>>
>> Project
>>   IndexScan
>>
>> while for Phoenix approach we would have two equivalent subsets in our
>> planner:
>>
>> Project
>>   Sort
>> TableScan
>>
>> and
>>
>> Project
>>   IndexScan
>>
>> and most likely the last plan  will be chosen as the best one.
>>
>> --
>> Kind Regards
>> Roman Kondakov
>>
>>
>> On 10.12.2019 17:19, Vladimir Ozerov wrote:
>>> Hi Roman,
>>>
>>> Why do you think that Drill-style will not let you exploit collation?
>>> Collation should be propagated from the index scan in the same way as in
>>> other sorted operators, such as merge join or streaming aggregate.
>> Provided
>>> that you use converter-hack (or any alternative solution to trigger
>> parent
>>> re-analysis).
>>> In other words, propagation of collation from Drill-style indexes should
>> be
>>> no different from other sorted operators.
>>>
>>> Regards,
>>> Vladimir.
>>>
>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>> >>> :
>>>

 Roman just as fast remark, Phoenix builds their approach on
 already existing monolith HBase architecture, most cases it`s just a
>> stub
 for someone who wants use secondary indexes with a base with no
 native support of it. Don`t think it`s good idea here.

>
>
> --- Forwarded message ---
> From: "Roman Kondakov" < kondako...@mail.ru.invalid >
> To:  dev@ignite.apache.org
> Cc:
> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> planner
> Date: Tue, 10 Dec 2019 15:55:52 +0300
>
> Hi all!
>
> As you may know there is an activity on integration of Apache Calcite
> query optimizer into Ignite codebase is being carried out [1],[2].
>
> One of a bunch of problems in this integration is the absence of
> out-of-the-box support for secondary indexes in Apache Calcite. After
> some research I came to conclusion that this problem has a couple of
> workarounds. Let's name them
> 1. Phoenix-style approach - representing secondary indexes as
> materialized views which are natively supported by Calcite engine [3]
> 2. Drill-style approach - pushing filters into the table scans and
> choose appropriate index for lookups when possible [4]
>
> Both these approaches have advantages and disadvantages:
>
> Phoenix style pros:
> - natural way of adding indexes as an alternative source of rows: index
> can be considered as a kind of sorted materialized view.
> - possibility of using index sortedness for stream aggregates,
> deduplication (DISTINCT operator), merge joins, etc.
> - ability to support other types of indexes (i.e. functional indexes).
>
> Phoenix style cons:
> - polluting optimizer's search space extra table scans hence increasing
> the planning time.
>
> Drill style pros:
> - easier to implement (although it's questionable).
> - search space is not inflated.
>
> Drill style cons:
> - missed opportunity to exploit sortedness.
>
> There is a good discussion about using both approaches can be found in
 [5].
>
> I made a small sketch [6] in order to demonstrate the applicability of
> the Phoenix approach to Ignite. Key design concepts are:
> 1. On creating indexes are registered as tables in Calcite schema. This
> step is needed for internal Calcite's routines.
> 2. On planner initialization we register these indexes as materialized
> views in Calcite's optimizer using 

[jira] [Created] (IGNITE-12437) .NET: Run tests on macOS TeamCity agent

2019-12-11 Thread Pavel Tupitsyn (Jira)
Pavel Tupitsyn created IGNITE-12437:
---

 Summary: .NET: Run tests on macOS TeamCity agent
 Key: IGNITE-12437
 URL: https://issues.apache.org/jira/browse/IGNITE-12437
 Project: Ignite
  Issue Type: Task
  Components: platforms
Reporter: Pavel Tupitsyn
Assignee: Pavel Tupitsyn


There is one acOS agent on TC. Run tests there to ensure full support of 
Ignite.NET on macOS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Adding support for Ignite secondary indexes to Apache Calcite planner

2019-12-11 Thread Ivan Pavlukhin
Vladimir,

You are right Phoenix integration with Calcite stalled halfway. See
[1] to get some reasons.

[1] 
https://lists.apache.org/thread.html/0152a97bfebb85c74f10e26e94ab9cd416dec374abba7dc2e1af9d61%40%3Cdev.phoenix.apache.org%3E

ср, 11 дек. 2019 г. в 17:11, Vladimir Ozerov :
>
> Roman,
>
> What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
> integration with Calcite never made it to production, did it?
>
> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov :
>
> > Hi Vladimir,
> >
> > from what I understand, Drill does not exploit collation of indexes. To
> > be precise it does not exploit index collation in "natural" way where,
> > say, we a have sorted TableScan and hence we do not create a new Sort.
> > Instead of it Drill always create a Sort operator, but if TableScan can
> > be replaced with an IndexScan, this Sort operator is removed by the
> > dedicated rule.
> >
> > Lets consider initial an operator tree:
> >
> > Project
> >   Sort
> > TableScan
> >
> > after applying rule DbScanToIndexScanPrule this tree will be converted to:
> >
> > Project
> >   Sort
> > IndexScan
> >
> > and finally, after applying DbScanSortRemovalRule we have:
> >
> > Project
> >   IndexScan
> >
> > while for Phoenix approach we would have two equivalent subsets in our
> > planner:
> >
> > Project
> >   Sort
> > TableScan
> >
> > and
> >
> > Project
> >   IndexScan
> >
> > and most likely the last plan  will be chosen as the best one.
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 10.12.2019 17:19, Vladimir Ozerov wrote:
> > > Hi Roman,
> > >
> > > Why do you think that Drill-style will not let you exploit collation?
> > > Collation should be propagated from the index scan in the same way as in
> > > other sorted operators, such as merge join or streaming aggregate.
> > Provided
> > > that you use converter-hack (or any alternative solution to trigger
> > parent
> > > re-analysis).
> > > In other words, propagation of collation from Drill-style indexes should
> > be
> > > no different from other sorted operators.
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> >  > >> :
> > >
> > >>
> > >> Roman just as fast remark, Phoenix builds their approach on
> > >> already existing monolith HBase architecture, most cases it`s just a
> > stub
> > >> for someone who wants use secondary indexes with a base with no
> > >> native support of it. Don`t think it`s good idea here.
> > >>
> > >>>
> > >>>
> > >>> --- Forwarded message ---
> > >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid >
> > >>> To:  dev@ignite.apache.org
> > >>> Cc:
> > >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> > >>> planner
> > >>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> > >>>
> > >>> Hi all!
> > >>>
> > >>> As you may know there is an activity on integration of Apache Calcite
> > >>> query optimizer into Ignite codebase is being carried out [1],[2].
> > >>>
> > >>> One of a bunch of problems in this integration is the absence of
> > >>> out-of-the-box support for secondary indexes in Apache Calcite. After
> > >>> some research I came to conclusion that this problem has a couple of
> > >>> workarounds. Let's name them
> > >>> 1. Phoenix-style approach - representing secondary indexes as
> > >>> materialized views which are natively supported by Calcite engine [3]
> > >>> 2. Drill-style approach - pushing filters into the table scans and
> > >>> choose appropriate index for lookups when possible [4]
> > >>>
> > >>> Both these approaches have advantages and disadvantages:
> > >>>
> > >>> Phoenix style pros:
> > >>> - natural way of adding indexes as an alternative source of rows: index
> > >>> can be considered as a kind of sorted materialized view.
> > >>> - possibility of using index sortedness for stream aggregates,
> > >>> deduplication (DISTINCT operator), merge joins, etc.
> > >>> - ability to support other types of indexes (i.e. functional indexes).
> > >>>
> > >>> Phoenix style cons:
> > >>> - polluting optimizer's search space extra table scans hence increasing
> > >>> the planning time.
> > >>>
> > >>> Drill style pros:
> > >>> - easier to implement (although it's questionable).
> > >>> - search space is not inflated.
> > >>>
> > >>> Drill style cons:
> > >>> - missed opportunity to exploit sortedness.
> > >>>
> > >>> There is a good discussion about using both approaches can be found in
> > >> [5].
> > >>>
> > >>> I made a small sketch [6] in order to demonstrate the applicability of
> > >>> the Phoenix approach to Ignite. Key design concepts are:
> > >>> 1. On creating indexes are registered as tables in Calcite schema. This
> > >>> step is needed for internal Calcite's routines.
> > >>> 2. On planner initialization we register these indexes as materialized
> > >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> > >>> method.
> > >>> 3. Right before the query execution Calcite selects all 

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

2019-12-11 Thread Vladimir Ozerov
Roman,

What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
integration with Calcite never made it to production, did it?

вт, 10 дек. 2019 г. в 19:50, Roman Kondakov :

> Hi Vladimir,
>
> from what I understand, Drill does not exploit collation of indexes. To
> be precise it does not exploit index collation in "natural" way where,
> say, we a have sorted TableScan and hence we do not create a new Sort.
> Instead of it Drill always create a Sort operator, but if TableScan can
> be replaced with an IndexScan, this Sort operator is removed by the
> dedicated rule.
>
> Lets consider initial an operator tree:
>
> Project
>   Sort
> TableScan
>
> after applying rule DbScanToIndexScanPrule this tree will be converted to:
>
> Project
>   Sort
> IndexScan
>
> and finally, after applying DbScanSortRemovalRule we have:
>
> Project
>   IndexScan
>
> while for Phoenix approach we would have two equivalent subsets in our
> planner:
>
> Project
>   Sort
> TableScan
>
> and
>
> Project
>   IndexScan
>
> and most likely the last plan  will be chosen as the best one.
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 10.12.2019 17:19, Vladimir Ozerov wrote:
> > Hi Roman,
> >
> > Why do you think that Drill-style will not let you exploit collation?
> > Collation should be propagated from the index scan in the same way as in
> > other sorted operators, such as merge join or streaming aggregate.
> Provided
> > that you use converter-hack (or any alternative solution to trigger
> parent
> > re-analysis).
> > In other words, propagation of collation from Drill-style indexes should
> be
> > no different from other sorted operators.
> >
> > Regards,
> > Vladimir.
> >
> > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>  >> :
> >
> >>
> >> Roman just as fast remark, Phoenix builds their approach on
> >> already existing monolith HBase architecture, most cases it`s just a
> stub
> >> for someone who wants use secondary indexes with a base with no
> >> native support of it. Don`t think it`s good idea here.
> >>
> >>>
> >>>
> >>> --- Forwarded message ---
> >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid >
> >>> To:  dev@ignite.apache.org
> >>> Cc:
> >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> >>> planner
> >>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> >>>
> >>> Hi all!
> >>>
> >>> As you may know there is an activity on integration of Apache Calcite
> >>> query optimizer into Ignite codebase is being carried out [1],[2].
> >>>
> >>> One of a bunch of problems in this integration is the absence of
> >>> out-of-the-box support for secondary indexes in Apache Calcite. After
> >>> some research I came to conclusion that this problem has a couple of
> >>> workarounds. Let's name them
> >>> 1. Phoenix-style approach - representing secondary indexes as
> >>> materialized views which are natively supported by Calcite engine [3]
> >>> 2. Drill-style approach - pushing filters into the table scans and
> >>> choose appropriate index for lookups when possible [4]
> >>>
> >>> Both these approaches have advantages and disadvantages:
> >>>
> >>> Phoenix style pros:
> >>> - natural way of adding indexes as an alternative source of rows: index
> >>> can be considered as a kind of sorted materialized view.
> >>> - possibility of using index sortedness for stream aggregates,
> >>> deduplication (DISTINCT operator), merge joins, etc.
> >>> - ability to support other types of indexes (i.e. functional indexes).
> >>>
> >>> Phoenix style cons:
> >>> - polluting optimizer's search space extra table scans hence increasing
> >>> the planning time.
> >>>
> >>> Drill style pros:
> >>> - easier to implement (although it's questionable).
> >>> - search space is not inflated.
> >>>
> >>> Drill style cons:
> >>> - missed opportunity to exploit sortedness.
> >>>
> >>> There is a good discussion about using both approaches can be found in
> >> [5].
> >>>
> >>> I made a small sketch [6] in order to demonstrate the applicability of
> >>> the Phoenix approach to Ignite. Key design concepts are:
> >>> 1. On creating indexes are registered as tables in Calcite schema. This
> >>> step is needed for internal Calcite's routines.
> >>> 2. On planner initialization we register these indexes as materialized
> >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> >>> method.
> >>> 3. Right before the query execution Calcite selects all materialized
> >>> views (indexes) which can be potentially used in query.
> >>> 4. During the query optimization indexes are registered by planner as
> >>> usual TableScans and hence can be chosen by optimizer if they have
> lower
> >>> cost.
> >>>
> >>> This sketch shows the ability to exploit index sortedness only. So the
> >>> future work in this direction should be focused on using indexes for
> >>> fast index lookups. At first glance FilterableTable and
> >>> FilterTableScanRule are good points to start. We can push Filter into
> >>> the 

[jira] [Created] (IGNITE-12436) ignite.properties must be handled by maven update-version profile

2019-12-11 Thread Maxim Muzafarov (Jira)
Maxim Muzafarov created IGNITE-12436:


 Summary: ignite.properties must be handled by maven update-version 
profile
 Key: IGNITE-12436
 URL: https://issues.apache.org/jira/browse/IGNITE-12436
 Project: Ignite
  Issue Type: Bug
Reporter: Maxim Muzafarov
Assignee: Maxim Muzafarov


Currently {{ignite\modules\core\src\main\resources\ignite.properties}} is 
manually handled to change the actually used {{ignite.version}} (e.g. a new 
release version occurred)

This should be done automatically when the update-version profile is active.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12435) [Spark] Add support for saving to existing table via saveAsTable

2019-12-11 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-12435:


 Summary: [Spark] Add support for saving to existing table via 
saveAsTable
 Key: IGNITE-12435
 URL: https://issues.apache.org/jira/browse/IGNITE-12435
 Project: Ignite
  Issue Type: Sub-task
  Components: spark
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev
 Fix For: 2.9


Tests in IgniteSQLDataFrameIgniteSessionWriteSpec are muted due to strange 
error related to working with filesystems and schemas

 

All three tests generates the same error when you are trying to call 
saveAsTable as a terminal operation on dataframe write: 

java.io.IOException: No FileSystem for scheme: ignitejava.io.IOException: No 
FileSystem for scheme: ignite
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at 
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at 
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:333)
 at 
org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) 
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474) 
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:449) 
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409) 
at 
org.apache.ignite.spark.IgniteSQLDataFrameIgniteSessionWriteSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(IgniteSQLDataFrameIgniteSessionWriteSpec.scala:45)
 at 
org.apache.ignite.spark.IgniteSQLDataFrameIgniteSessionWriteSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(IgniteSQLDataFrameIgniteSessionWriteSpec.scala:35)
 at 
org.apache.ignite.spark.IgniteSQLDataFrameIgniteSessionWriteSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(IgniteSQLDataFrameIgniteSessionWriteSpec.scala:35)
 at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) 
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at 
org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at 
org.scalatest.Transformer.apply(Transformer.scala:22) at 
org.scalatest.Transformer.apply(Transformer.scala:20) at 
org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:422) at 
org.scalatest.Suite$class.withFixture(Suite.scala:1122) at 
org.scalatest.FunSpec.withFixture(FunSpec.scala:1626) at 
org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:419) at 
org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431) at 
org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431) at 
org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at 
org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:431) at 
org.apache.ignite.spark.AbstractDataFrameSpec.org$scalatest$BeforeAndAfter$$super$runTest(AbstractDataFrameSpec.scala:39)
 at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at 

[jira] [Created] (IGNITE-12434) Dump checkpoint readLock holder threads if writeLock can`t take lock more than threshold timeout.

2019-12-11 Thread Stanilovsky Evgeny (Jira)
Stanilovsky Evgeny created IGNITE-12434:
---

 Summary: Dump checkpoint readLock holder threads if writeLock 
can`t take lock more than threshold timeout.
 Key: IGNITE-12434
 URL: https://issues.apache.org/jira/browse/IGNITE-12434
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.7.6
Reporter: Stanilovsky Evgeny
Assignee: Stanilovsky Evgeny


Huge cache operations like removeAll or some hardware problems with further gc 
lock can await readLock for a long time , this can prevent for long writeLock 
wait timeout and as a result - long checkpoint timeout, it would be very 
informative to log such situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Duplicate column name while creating table

2019-12-11 Thread Ilya Kasnacheev
Hello!

I have filed an issue https://issues.apache.org/jira/browse/IGNITE-12433

Regards,
-- 
Ilya Kasnacheev


ср, 11 дек. 2019 г. в 12:43, Denis Magda :

> It sounds like the implementation specificities of Ignite DML.
>
> SQL folks, how about throwing an exception in case of the duplicate name?
>
> -
> Denis
>
>
> On Thu, Dec 5, 2019 at 9:38 AM DS 
> wrote:
>
>> *I am able to create the table with duplicate column name.*
>>
>> I was expecting some error saying "cannot create table; duplicate column
>> name: NAME"
>> Is there some reason that Ignite is not throwing error/exception OR
>> is it a bug?
>>
>> CREATE TABLE Person(ID INTEGER PRIMARY KEY,  NAME VARCHAR(100),  NAME
>> VARCHAR(100),  AGE INTEGER (64));
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


[jira] [Created] (IGNITE-12433) Possible to create table with duplicate definition of column

2019-12-11 Thread Ilya Kasnacheev (Jira)
Ilya Kasnacheev created IGNITE-12433:


 Summary: Possible to create table with duplicate definition of 
column
 Key: IGNITE-12433
 URL: https://issues.apache.org/jira/browse/IGNITE-12433
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.8
Reporter: Ilya Kasnacheev


{code}
sqlline version 1.3.0
sqlline> !connect jdbc:ignite:thin://localhost
Enter username for jdbc:ignite:thin://localhost: 
Enter password for jdbc:ignite:thin://localhost: 
0: jdbc:ignite:thin://localhost> CREATE TABLE Person(ID INTEGER PRIMARY KEY,  
NAME VARCHAR(100),  NAME
. . . . . . . . . . . . . . . .> VARCHAR(100),  AGE INTEGER (64));
No rows affected (0,229 seconds)
0: jdbc:ignite:thin://localhost> select * from person;
++++
|   ID   |  NAME  | 
 AGE   |
++++
++++
No rows selected (0,073 seconds)
{code}

This is on master branch. "NAME VARCHAR(100)" twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12432) [Spark] Need to add test for AVG function in IgniteOptimizationAggregationFuncSpec

2019-12-11 Thread Alexey Zinoviev (Jira)
Alexey Zinoviev created IGNITE-12432:


 Summary: [Spark] Need to add test for AVG function in 
IgniteOptimizationAggregationFuncSpec
 Key: IGNITE-12432
 URL: https://issues.apache.org/jira/browse/IGNITE-12432
 Project: Ignite
  Issue Type: Test
  Components: spark
Reporter: Alexey Zinoviev
Assignee: Alexey Zinoviev
 Fix For: 2.9


The test is skipped with TODO: write me

it("AVG - DECIMAL") {
 //TODO: write me
}

It should be merged to 2.3 and 2.4 Spark together



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Duplicate column name while creating table

2019-12-11 Thread Denis Magda
It sounds like the implementation specificities of Ignite DML.

SQL folks, how about throwing an exception in case of the duplicate name?

-
Denis


On Thu, Dec 5, 2019 at 9:38 AM DS 
wrote:

> *I am able to create the table with duplicate column name.*
>
> I was expecting some error saying "cannot create table; duplicate column
> name: NAME"
> Is there some reason that Ignite is not throwing error/exception OR
> is it a bug?
>
> CREATE TABLE Person(ID INTEGER PRIMARY KEY,  NAME VARCHAR(100),  NAME
> VARCHAR(100),  AGE INTEGER (64));
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>