Re: IgniteOutOfMemoryException in LOCAL cache mode with persistence enabled
Hi Mitchell, I believe that research done by Anton is correct, and the root cause of the OOME is proportion of memory occupied by metapages in data region. Each cache started in data region allocates one or more metapages per initialized partition so when you run your test with only one cache this is not a problem, but when second cache is added this results in OOME. I don't think there is an easy way to prevent this exception in general but I agree that we need to provide more descriptive error message and/or early warning for the user that configuration of caches and data regions may lead to such exception. I'll file a ticket for this improvement soon. Best regards, Sergey On Thu, Dec 12, 2019 at 1:27 AM Denis Magda wrote: > I tend to agree with Mitchell that the cluster should not crash. If the > crash is unavoidable based on the current architecture then a message > should be more descriptive. > > Ignite persistence experts, could you please join the conversation and > shed more light to the reported behavior? > > - > Denis > > > On Wed, Dec 11, 2019 at 3:25 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) < > mrathb...@bloomberg.net> wrote: > >> 2 GB is not reasonable for off heap memory for our use case. In general, >> even if off-heap is very low, performance should just degrade and calls >> should become blocking, I don't think that we should crash. Either way, the >> issue seems to be with putAll, not concurrent updates of different caches >> in the same data region. If I use Ignite's DataStreamer API instead of >> putAll, I get much better performance and no OOM exception. Any insight >> into why this might be would be appreciated. >> >> From: u...@ignite.apache.org At: 12/10/19 11:24:35 >> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , >> u...@ignite.apache.org >> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with >> persistence enabled >> >> Hello! >> >> 10M is very very low-ball for testing performance of disk, considering >> how Ignite's wal/checkpoints are structured. As already told, it does not >> even work properly. >> >> I recommend using 2G value instead. Just load enough data so that you can >> observe constant checkpoints. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> ср, 4 дек. 2019 г. в 03:16, Mitchell Rathbun (BLOOMBERG/ 731 LEX) < >> mrathb...@bloomberg.net>: >> >>> For the requested full ignite log, where would this be found if we are >>> running using local mode? We are not explicitly running a separate ignite >>> node, and our WorkDirectory does not seem to have any logs >>> >>> From: u...@ignite.apache.org At: 12/03/19 19:00:18 >>> To: u...@ignite.apache.org >>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with >>> persistence enabled >>> >>> For our configuration properties, our DataRegion initialSize and MaxSize >>> was set to 11 MB and persistence was enabled. For DataStorage, our pageSize >>> was set to 8192 instead of 4096. For Cache, write behind is disabled, on >>> heap cache is disabled, and Atomicity Mode is Atomic >>> >>> From: u...@ignite.apache.org At: 12/03/19 13:40:32 >>> To: u...@ignite.apache.org >>> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with >>> persistence enabled >>> >>> Hi Mitchell, >>> >>> Looks like it could be easily reproduced on low off-heap sizes, I tried >>> with >>> simple puts and got the same exception: >>> >>> class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed >>> to >>> find a page for eviction [segmentCapacity=1580, loaded=619, >>> maxDirtyPages=465, dirtyPages=619, cpPages=0, pinnedInSegment=0, >>> failedToPrepare=620] >>> Out of memory in data region [name=Default_Region, initSize=10.0 MiB, >>> maxSize=10.0 MiB, persistenceEnabled=true] Try the following: >>> ^-- Increase maximum off-heap memory size >>> (DataRegionConfiguration.maxSize) >>> ^-- Enable Ignite persistence >>> (DataRegionConfiguration.persistenceEnabled) >>> ^-- Enable eviction or expiration policies >>> >>> It looks like Ignite must issue a proper warning in this case and couple >>> of >>> issues must be filed against Ignite JIRA. >>> >>> Check out this article on persistent store available in Ignite >>> confluence as >>> well: >>> >>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+und >>> er+the+hood#IgnitePersistentStore-underthehood-Checkpointing >>> >>> I've managed to make kind of similar example working with 20 Mb region >>> with >>> a bit of tuning, added following properties to >>> org.apache.ignite.configuration.DataStorageConfiguration: >>> / >>> / >>> >>> The whole idea behind this is to trigger checkpoint on timeout rather >>> than >>> on too much dirty pages percentage threshold. The checkpoint page buffer >>> size may not exceed data region size, which is 10 Mb, which might be >>> overflown during checkpoint as well. >>> >>> I assume that checkpoint is never triggered in this case because of >>> per-partition overhead: Ignite writes some meta per partition and it
Re: Adding support for Ignite secondary indexes to Apache Calcite planner
Roman, What I am trying to understand is what advantage of materialization API you see over the normal optimization process? Does it save optimization time, or reduce memory footprint, or maybe provide better plans? I am asking because I do not see how expressing indexes as materializations fit classical optimization process. We discussed Sort <- Scan optimization. Let's consider another example: LogicalSort[a ASC] LogicalJoin Initially, you do not know the implementation of the join, and hence do not know it's collation. Then you may execute physical join rules, which produce, say, PhysicalMergeJoin[a ASC]. If you execute sort implementation rule afterwards, you may easily eliminate the sort, or make it simpler (e.g. remove local sorting phase), depending on the distribution. In other words, proper implementation of sorting optimization assumes that you have a kind of SortRemoveRule anyway, irrespectively of whether you use materializations or not, because sorting may be injected on top of any operator. With this in mind, the use of materializations doesn't make the planner simpler. Neither it improves the outcome of the whole optimization process. What is left is either lower CPU or RAM usage? Is this the case? ср, 11 дек. 2019 г. в 18:37, Roman Kondakov : > Vladimir, > > the main advantage of the Phoenix approach I can see is the using of > Calcite's native materializations API. Calcite has advanced support for > materializations [1] and lattices [2]. Since secondary indexes can be > considered as materialized views (it's just a sorted representation of > the same table) we can seamlessly use views to simulate indexes behavior > for Calcite planner. > > > [1] https://calcite.apache.org/docs/materialized_views.html > [2] https://calcite.apache.org/docs/lattice.html > > -- > Kind Regards > Roman Kondakov > > > On 11.12.2019 17:11, Vladimir Ozerov wrote: > > Roman, > > > > What is the advantage of Phoenix approach then? BTW, it looks like > Phoenix > > integration with Calcite never made it to production, did it? > > > > вт, 10 дек. 2019 г. в 19:50, Roman Kondakov >: > > > >> Hi Vladimir, > >> > >> from what I understand, Drill does not exploit collation of indexes. To > >> be precise it does not exploit index collation in "natural" way where, > >> say, we a have sorted TableScan and hence we do not create a new Sort. > >> Instead of it Drill always create a Sort operator, but if TableScan can > >> be replaced with an IndexScan, this Sort operator is removed by the > >> dedicated rule. > >> > >> Lets consider initial an operator tree: > >> > >> Project > >> Sort > >> TableScan > >> > >> after applying rule DbScanToIndexScanPrule this tree will be converted > to: > >> > >> Project > >> Sort > >> IndexScan > >> > >> and finally, after applying DbScanSortRemovalRule we have: > >> > >> Project > >> IndexScan > >> > >> while for Phoenix approach we would have two equivalent subsets in our > >> planner: > >> > >> Project > >> Sort > >> TableScan > >> > >> and > >> > >> Project > >> IndexScan > >> > >> and most likely the last plan will be chosen as the best one. > >> > >> -- > >> Kind Regards > >> Roman Kondakov > >> > >> > >> On 10.12.2019 17:19, Vladimir Ozerov wrote: > >>> Hi Roman, > >>> > >>> Why do you think that Drill-style will not let you exploit collation? > >>> Collation should be propagated from the index scan in the same way as > in > >>> other sorted operators, such as merge join or streaming aggregate. > >> Provided > >>> that you use converter-hack (or any alternative solution to trigger > >> parent > >>> re-analysis). > >>> In other words, propagation of collation from Drill-style indexes > should > >> be > >>> no different from other sorted operators. > >>> > >>> Regards, > >>> Vladimir. > >>> > >>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky > >> : > >>> > > Roman just as fast remark, Phoenix builds their approach on > already existing monolith HBase architecture, most cases it`s just a > >> stub > for someone who wants use secondary indexes with a base with no > native support of it. Don`t think it`s good idea here. > > > > > > > --- Forwarded message --- > > From: "Roman Kondakov" < kondako...@mail.ru.invalid > > > To: dev@ignite.apache.org > > Cc: > > Subject: Adding support for Ignite secondary indexes to Apache > Calcite > > planner > > Date: Tue, 10 Dec 2019 15:55:52 +0300 > > > > Hi all! > > > > As you may know there is an activity on integration of Apache Calcite > > query optimizer into Ignite codebase is being carried out [1],[2]. > > > > One of a bunch of problems in this integration is the absence of > > out-of-the-box support for secondary indexes in Apache Calcite. After > > some research I came to conclusion that this problem has a couple of > > workarounds. Let's name them > > 1. Phoenix-style approach - representing
Re: IgniteOutOfMemoryException in LOCAL cache mode with persistence enabled
I tend to agree with Mitchell that the cluster should not crash. If the crash is unavoidable based on the current architecture then a message should be more descriptive. Ignite persistence experts, could you please join the conversation and shed more light to the reported behavior? - Denis On Wed, Dec 11, 2019 at 3:25 AM Mitchell Rathbun (BLOOMBERG/ 731 LEX) < mrathb...@bloomberg.net> wrote: > 2 GB is not reasonable for off heap memory for our use case. In general, > even if off-heap is very low, performance should just degrade and calls > should become blocking, I don't think that we should crash. Either way, the > issue seems to be with putAll, not concurrent updates of different caches > in the same data region. If I use Ignite's DataStreamer API instead of > putAll, I get much better performance and no OOM exception. Any insight > into why this might be would be appreciated. > > From: u...@ignite.apache.org At: 12/10/19 11:24:35 > To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , > u...@ignite.apache.org > Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with > persistence enabled > > Hello! > > 10M is very very low-ball for testing performance of disk, considering how > Ignite's wal/checkpoints are structured. As already told, it does not even > work properly. > > I recommend using 2G value instead. Just load enough data so that you can > observe constant checkpoints. > > Regards, > -- > Ilya Kasnacheev > > > ср, 4 дек. 2019 г. в 03:16, Mitchell Rathbun (BLOOMBERG/ 731 LEX) < > mrathb...@bloomberg.net>: > >> For the requested full ignite log, where would this be found if we are >> running using local mode? We are not explicitly running a separate ignite >> node, and our WorkDirectory does not seem to have any logs >> >> From: u...@ignite.apache.org At: 12/03/19 19:00:18 >> To: u...@ignite.apache.org >> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with >> persistence enabled >> >> For our configuration properties, our DataRegion initialSize and MaxSize >> was set to 11 MB and persistence was enabled. For DataStorage, our pageSize >> was set to 8192 instead of 4096. For Cache, write behind is disabled, on >> heap cache is disabled, and Atomicity Mode is Atomic >> >> From: u...@ignite.apache.org At: 12/03/19 13:40:32 >> To: u...@ignite.apache.org >> Subject: Re: IgniteOutOfMemoryException in LOCAL cache mode with >> persistence enabled >> >> Hi Mitchell, >> >> Looks like it could be easily reproduced on low off-heap sizes, I tried >> with >> simple puts and got the same exception: >> >> class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to >> find a page for eviction [segmentCapacity=1580, loaded=619, >> maxDirtyPages=465, dirtyPages=619, cpPages=0, pinnedInSegment=0, >> failedToPrepare=620] >> Out of memory in data region [name=Default_Region, initSize=10.0 MiB, >> maxSize=10.0 MiB, persistenceEnabled=true] Try the following: >> ^-- Increase maximum off-heap memory size >> (DataRegionConfiguration.maxSize) >> ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled) >> ^-- Enable eviction or expiration policies >> >> It looks like Ignite must issue a proper warning in this case and couple >> of >> issues must be filed against Ignite JIRA. >> >> Check out this article on persistent store available in Ignite confluence >> as >> well: >> >> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+und >> er+the+hood#IgnitePersistentStore-underthehood-Checkpointing >> >> I've managed to make kind of similar example working with 20 Mb region >> with >> a bit of tuning, added following properties to >> org.apache.ignite.configuration.DataStorageConfiguration: >> / >> / >> >> The whole idea behind this is to trigger checkpoint on timeout rather than >> on too much dirty pages percentage threshold. The checkpoint page buffer >> size may not exceed data region size, which is 10 Mb, which might be >> overflown during checkpoint as well. >> >> I assume that checkpoint is never triggered in this case because of >> per-partition overhead: Ignite writes some meta per partition and it looks >> like that it is at least 1 meta page utilized for each which results in >> some >> amount of off-heap devoured by these meta pages. In the case with the >> lowest >> possible region size, this might consume more than 3 Mb for cache with 1k >> partitions and 70% dirty data pages threshold would never be reached. >> >> However, I found another issue when it is not possible to save meta page >> on >> checkpoint begin, this reproduces on 10 Mb data region with mentioned >> storage configuration options. >> >> Could you please describe the configuration if you have anything different >> from defaults (page size, wal mode, partitions count) and types of >> key/value >> that you use? And if it is possible, could you please attach full Ignite >> log >> from the node that has suffered from IOOM? >> >> As for the data region/cache, in reality you do also
Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)
I will look to the MOVING partition issue. But also need a guidance there. Ivan, don't you mind to be that person? The question is whether we have an issue with: - wrong storing targets during indexing OR - incorrect nodes/partition selection during querying? BR, Yuriy Shluiha -- Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
Re: Adding support for Ignite secondary indexes to Apache Calcite planner
Vladimir, the main advantage of the Phoenix approach I can see is the using of Calcite's native materializations API. Calcite has advanced support for materializations [1] and lattices [2]. Since secondary indexes can be considered as materialized views (it's just a sorted representation of the same table) we can seamlessly use views to simulate indexes behavior for Calcite planner. [1] https://calcite.apache.org/docs/materialized_views.html [2] https://calcite.apache.org/docs/lattice.html -- Kind Regards Roman Kondakov On 11.12.2019 17:11, Vladimir Ozerov wrote: > Roman, > > What is the advantage of Phoenix approach then? BTW, it looks like Phoenix > integration with Calcite never made it to production, did it? > > вт, 10 дек. 2019 г. в 19:50, Roman Kondakov : > >> Hi Vladimir, >> >> from what I understand, Drill does not exploit collation of indexes. To >> be precise it does not exploit index collation in "natural" way where, >> say, we a have sorted TableScan and hence we do not create a new Sort. >> Instead of it Drill always create a Sort operator, but if TableScan can >> be replaced with an IndexScan, this Sort operator is removed by the >> dedicated rule. >> >> Lets consider initial an operator tree: >> >> Project >> Sort >> TableScan >> >> after applying rule DbScanToIndexScanPrule this tree will be converted to: >> >> Project >> Sort >> IndexScan >> >> and finally, after applying DbScanSortRemovalRule we have: >> >> Project >> IndexScan >> >> while for Phoenix approach we would have two equivalent subsets in our >> planner: >> >> Project >> Sort >> TableScan >> >> and >> >> Project >> IndexScan >> >> and most likely the last plan will be chosen as the best one. >> >> -- >> Kind Regards >> Roman Kondakov >> >> >> On 10.12.2019 17:19, Vladimir Ozerov wrote: >>> Hi Roman, >>> >>> Why do you think that Drill-style will not let you exploit collation? >>> Collation should be propagated from the index scan in the same way as in >>> other sorted operators, such as merge join or streaming aggregate. >> Provided >>> that you use converter-hack (or any alternative solution to trigger >> parent >>> re-analysis). >>> In other words, propagation of collation from Drill-style indexes should >> be >>> no different from other sorted operators. >>> >>> Regards, >>> Vladimir. >>> >>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky >> >>> : >>> Roman just as fast remark, Phoenix builds their approach on already existing monolith HBase architecture, most cases it`s just a >> stub for someone who wants use secondary indexes with a base with no native support of it. Don`t think it`s good idea here. > > > --- Forwarded message --- > From: "Roman Kondakov" < kondako...@mail.ru.invalid > > To: dev@ignite.apache.org > Cc: > Subject: Adding support for Ignite secondary indexes to Apache Calcite > planner > Date: Tue, 10 Dec 2019 15:55:52 +0300 > > Hi all! > > As you may know there is an activity on integration of Apache Calcite > query optimizer into Ignite codebase is being carried out [1],[2]. > > One of a bunch of problems in this integration is the absence of > out-of-the-box support for secondary indexes in Apache Calcite. After > some research I came to conclusion that this problem has a couple of > workarounds. Let's name them > 1. Phoenix-style approach - representing secondary indexes as > materialized views which are natively supported by Calcite engine [3] > 2. Drill-style approach - pushing filters into the table scans and > choose appropriate index for lookups when possible [4] > > Both these approaches have advantages and disadvantages: > > Phoenix style pros: > - natural way of adding indexes as an alternative source of rows: index > can be considered as a kind of sorted materialized view. > - possibility of using index sortedness for stream aggregates, > deduplication (DISTINCT operator), merge joins, etc. > - ability to support other types of indexes (i.e. functional indexes). > > Phoenix style cons: > - polluting optimizer's search space extra table scans hence increasing > the planning time. > > Drill style pros: > - easier to implement (although it's questionable). > - search space is not inflated. > > Drill style cons: > - missed opportunity to exploit sortedness. > > There is a good discussion about using both approaches can be found in [5]. > > I made a small sketch [6] in order to demonstrate the applicability of > the Phoenix approach to Ignite. Key design concepts are: > 1. On creating indexes are registered as tables in Calcite schema. This > step is needed for internal Calcite's routines. > 2. On planner initialization we register these indexes as materialized > views in Calcite's optimizer using
[jira] [Created] (IGNITE-12437) .NET: Run tests on macOS TeamCity agent
Pavel Tupitsyn created IGNITE-12437: --- Summary: .NET: Run tests on macOS TeamCity agent Key: IGNITE-12437 URL: https://issues.apache.org/jira/browse/IGNITE-12437 Project: Ignite Issue Type: Task Components: platforms Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn There is one acOS agent on TC. Run tests there to ensure full support of Ignite.NET on macOS -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Adding support for Ignite secondary indexes to Apache Calcite planner
Vladimir, You are right Phoenix integration with Calcite stalled halfway. See [1] to get some reasons. [1] https://lists.apache.org/thread.html/0152a97bfebb85c74f10e26e94ab9cd416dec374abba7dc2e1af9d61%40%3Cdev.phoenix.apache.org%3E ср, 11 дек. 2019 г. в 17:11, Vladimir Ozerov : > > Roman, > > What is the advantage of Phoenix approach then? BTW, it looks like Phoenix > integration with Calcite never made it to production, did it? > > вт, 10 дек. 2019 г. в 19:50, Roman Kondakov : > > > Hi Vladimir, > > > > from what I understand, Drill does not exploit collation of indexes. To > > be precise it does not exploit index collation in "natural" way where, > > say, we a have sorted TableScan and hence we do not create a new Sort. > > Instead of it Drill always create a Sort operator, but if TableScan can > > be replaced with an IndexScan, this Sort operator is removed by the > > dedicated rule. > > > > Lets consider initial an operator tree: > > > > Project > > Sort > > TableScan > > > > after applying rule DbScanToIndexScanPrule this tree will be converted to: > > > > Project > > Sort > > IndexScan > > > > and finally, after applying DbScanSortRemovalRule we have: > > > > Project > > IndexScan > > > > while for Phoenix approach we would have two equivalent subsets in our > > planner: > > > > Project > > Sort > > TableScan > > > > and > > > > Project > > IndexScan > > > > and most likely the last plan will be chosen as the best one. > > > > -- > > Kind Regards > > Roman Kondakov > > > > > > On 10.12.2019 17:19, Vladimir Ozerov wrote: > > > Hi Roman, > > > > > > Why do you think that Drill-style will not let you exploit collation? > > > Collation should be propagated from the index scan in the same way as in > > > other sorted operators, such as merge join or streaming aggregate. > > Provided > > > that you use converter-hack (or any alternative solution to trigger > > parent > > > re-analysis). > > > In other words, propagation of collation from Drill-style indexes should > > be > > > no different from other sorted operators. > > > > > > Regards, > > > Vladimir. > > > > > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky > > > >> : > > > > > >> > > >> Roman just as fast remark, Phoenix builds their approach on > > >> already existing monolith HBase architecture, most cases it`s just a > > stub > > >> for someone who wants use secondary indexes with a base with no > > >> native support of it. Don`t think it`s good idea here. > > >> > > >>> > > >>> > > >>> --- Forwarded message --- > > >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid > > > >>> To: dev@ignite.apache.org > > >>> Cc: > > >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite > > >>> planner > > >>> Date: Tue, 10 Dec 2019 15:55:52 +0300 > > >>> > > >>> Hi all! > > >>> > > >>> As you may know there is an activity on integration of Apache Calcite > > >>> query optimizer into Ignite codebase is being carried out [1],[2]. > > >>> > > >>> One of a bunch of problems in this integration is the absence of > > >>> out-of-the-box support for secondary indexes in Apache Calcite. After > > >>> some research I came to conclusion that this problem has a couple of > > >>> workarounds. Let's name them > > >>> 1. Phoenix-style approach - representing secondary indexes as > > >>> materialized views which are natively supported by Calcite engine [3] > > >>> 2. Drill-style approach - pushing filters into the table scans and > > >>> choose appropriate index for lookups when possible [4] > > >>> > > >>> Both these approaches have advantages and disadvantages: > > >>> > > >>> Phoenix style pros: > > >>> - natural way of adding indexes as an alternative source of rows: index > > >>> can be considered as a kind of sorted materialized view. > > >>> - possibility of using index sortedness for stream aggregates, > > >>> deduplication (DISTINCT operator), merge joins, etc. > > >>> - ability to support other types of indexes (i.e. functional indexes). > > >>> > > >>> Phoenix style cons: > > >>> - polluting optimizer's search space extra table scans hence increasing > > >>> the planning time. > > >>> > > >>> Drill style pros: > > >>> - easier to implement (although it's questionable). > > >>> - search space is not inflated. > > >>> > > >>> Drill style cons: > > >>> - missed opportunity to exploit sortedness. > > >>> > > >>> There is a good discussion about using both approaches can be found in > > >> [5]. > > >>> > > >>> I made a small sketch [6] in order to demonstrate the applicability of > > >>> the Phoenix approach to Ignite. Key design concepts are: > > >>> 1. On creating indexes are registered as tables in Calcite schema. This > > >>> step is needed for internal Calcite's routines. > > >>> 2. On planner initialization we register these indexes as materialized > > >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization > > >>> method. > > >>> 3. Right before the query execution Calcite selects all
Re: Adding support for Ignite secondary indexes to Apache Calcite planner
Roman, What is the advantage of Phoenix approach then? BTW, it looks like Phoenix integration with Calcite never made it to production, did it? вт, 10 дек. 2019 г. в 19:50, Roman Kondakov : > Hi Vladimir, > > from what I understand, Drill does not exploit collation of indexes. To > be precise it does not exploit index collation in "natural" way where, > say, we a have sorted TableScan and hence we do not create a new Sort. > Instead of it Drill always create a Sort operator, but if TableScan can > be replaced with an IndexScan, this Sort operator is removed by the > dedicated rule. > > Lets consider initial an operator tree: > > Project > Sort > TableScan > > after applying rule DbScanToIndexScanPrule this tree will be converted to: > > Project > Sort > IndexScan > > and finally, after applying DbScanSortRemovalRule we have: > > Project > IndexScan > > while for Phoenix approach we would have two equivalent subsets in our > planner: > > Project > Sort > TableScan > > and > > Project > IndexScan > > and most likely the last plan will be chosen as the best one. > > -- > Kind Regards > Roman Kondakov > > > On 10.12.2019 17:19, Vladimir Ozerov wrote: > > Hi Roman, > > > > Why do you think that Drill-style will not let you exploit collation? > > Collation should be propagated from the index scan in the same way as in > > other sorted operators, such as merge join or streaming aggregate. > Provided > > that you use converter-hack (or any alternative solution to trigger > parent > > re-analysis). > > In other words, propagation of collation from Drill-style indexes should > be > > no different from other sorted operators. > > > > Regards, > > Vladimir. > > > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky > >> : > > > >> > >> Roman just as fast remark, Phoenix builds their approach on > >> already existing monolith HBase architecture, most cases it`s just a > stub > >> for someone who wants use secondary indexes with a base with no > >> native support of it. Don`t think it`s good idea here. > >> > >>> > >>> > >>> --- Forwarded message --- > >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid > > >>> To: dev@ignite.apache.org > >>> Cc: > >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite > >>> planner > >>> Date: Tue, 10 Dec 2019 15:55:52 +0300 > >>> > >>> Hi all! > >>> > >>> As you may know there is an activity on integration of Apache Calcite > >>> query optimizer into Ignite codebase is being carried out [1],[2]. > >>> > >>> One of a bunch of problems in this integration is the absence of > >>> out-of-the-box support for secondary indexes in Apache Calcite. After > >>> some research I came to conclusion that this problem has a couple of > >>> workarounds. Let's name them > >>> 1. Phoenix-style approach - representing secondary indexes as > >>> materialized views which are natively supported by Calcite engine [3] > >>> 2. Drill-style approach - pushing filters into the table scans and > >>> choose appropriate index for lookups when possible [4] > >>> > >>> Both these approaches have advantages and disadvantages: > >>> > >>> Phoenix style pros: > >>> - natural way of adding indexes as an alternative source of rows: index > >>> can be considered as a kind of sorted materialized view. > >>> - possibility of using index sortedness for stream aggregates, > >>> deduplication (DISTINCT operator), merge joins, etc. > >>> - ability to support other types of indexes (i.e. functional indexes). > >>> > >>> Phoenix style cons: > >>> - polluting optimizer's search space extra table scans hence increasing > >>> the planning time. > >>> > >>> Drill style pros: > >>> - easier to implement (although it's questionable). > >>> - search space is not inflated. > >>> > >>> Drill style cons: > >>> - missed opportunity to exploit sortedness. > >>> > >>> There is a good discussion about using both approaches can be found in > >> [5]. > >>> > >>> I made a small sketch [6] in order to demonstrate the applicability of > >>> the Phoenix approach to Ignite. Key design concepts are: > >>> 1. On creating indexes are registered as tables in Calcite schema. This > >>> step is needed for internal Calcite's routines. > >>> 2. On planner initialization we register these indexes as materialized > >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization > >>> method. > >>> 3. Right before the query execution Calcite selects all materialized > >>> views (indexes) which can be potentially used in query. > >>> 4. During the query optimization indexes are registered by planner as > >>> usual TableScans and hence can be chosen by optimizer if they have > lower > >>> cost. > >>> > >>> This sketch shows the ability to exploit index sortedness only. So the > >>> future work in this direction should be focused on using indexes for > >>> fast index lookups. At first glance FilterableTable and > >>> FilterTableScanRule are good points to start. We can push Filter into > >>> the
[jira] [Created] (IGNITE-12436) ignite.properties must be handled by maven update-version profile
Maxim Muzafarov created IGNITE-12436: Summary: ignite.properties must be handled by maven update-version profile Key: IGNITE-12436 URL: https://issues.apache.org/jira/browse/IGNITE-12436 Project: Ignite Issue Type: Bug Reporter: Maxim Muzafarov Assignee: Maxim Muzafarov Currently {{ignite\modules\core\src\main\resources\ignite.properties}} is manually handled to change the actually used {{ignite.version}} (e.g. a new release version occurred) This should be done automatically when the update-version profile is active. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12435) [Spark] Add support for saving to existing table via saveAsTable
Alexey Zinoviev created IGNITE-12435: Summary: [Spark] Add support for saving to existing table via saveAsTable Key: IGNITE-12435 URL: https://issues.apache.org/jira/browse/IGNITE-12435 Project: Ignite Issue Type: Sub-task Components: spark Reporter: Alexey Zinoviev Assignee: Alexey Zinoviev Fix For: 2.9 Tests in IgniteSQLDataFrameIgniteSessionWriteSpec are muted due to strange error related to working with filesystems and schemas All three tests generates the same error when you are trying to call saveAsTable as a terminal operation on dataframe write: java.io.IOException: No FileSystem for scheme: ignitejava.io.IOException: No FileSystem for scheme: ignite at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:333) at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:474) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:449) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409) at org.apache.ignite.spark.IgniteSQLDataFrameIgniteSessionWriteSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(IgniteSQLDataFrameIgniteSessionWriteSpec.scala:45) at org.apache.ignite.spark.IgniteSQLDataFrameIgniteSessionWriteSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(IgniteSQLDataFrameIgniteSessionWriteSpec.scala:35) at org.apache.ignite.spark.IgniteSQLDataFrameIgniteSessionWriteSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(IgniteSQLDataFrameIgniteSessionWriteSpec.scala:35) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:422) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSpec.withFixture(FunSpec.scala:1626) at org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:419) at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431) at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:431) at org.apache.ignite.spark.AbstractDataFrameSpec.org$scalatest$BeforeAndAfter$$super$runTest(AbstractDataFrameSpec.scala:39) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at
[jira] [Created] (IGNITE-12434) Dump checkpoint readLock holder threads if writeLock can`t take lock more than threshold timeout.
Stanilovsky Evgeny created IGNITE-12434: --- Summary: Dump checkpoint readLock holder threads if writeLock can`t take lock more than threshold timeout. Key: IGNITE-12434 URL: https://issues.apache.org/jira/browse/IGNITE-12434 Project: Ignite Issue Type: Improvement Components: persistence Affects Versions: 2.7.6 Reporter: Stanilovsky Evgeny Assignee: Stanilovsky Evgeny Huge cache operations like removeAll or some hardware problems with further gc lock can await readLock for a long time , this can prevent for long writeLock wait timeout and as a result - long checkpoint timeout, it would be very informative to log such situations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Duplicate column name while creating table
Hello! I have filed an issue https://issues.apache.org/jira/browse/IGNITE-12433 Regards, -- Ilya Kasnacheev ср, 11 дек. 2019 г. в 12:43, Denis Magda : > It sounds like the implementation specificities of Ignite DML. > > SQL folks, how about throwing an exception in case of the duplicate name? > > - > Denis > > > On Thu, Dec 5, 2019 at 9:38 AM DS > wrote: > >> *I am able to create the table with duplicate column name.* >> >> I was expecting some error saying "cannot create table; duplicate column >> name: NAME" >> Is there some reason that Ignite is not throwing error/exception OR >> is it a bug? >> >> CREATE TABLE Person(ID INTEGER PRIMARY KEY, NAME VARCHAR(100), NAME >> VARCHAR(100), AGE INTEGER (64)); >> >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> >
[jira] [Created] (IGNITE-12433) Possible to create table with duplicate definition of column
Ilya Kasnacheev created IGNITE-12433: Summary: Possible to create table with duplicate definition of column Key: IGNITE-12433 URL: https://issues.apache.org/jira/browse/IGNITE-12433 Project: Ignite Issue Type: Bug Components: sql Affects Versions: 2.8 Reporter: Ilya Kasnacheev {code} sqlline version 1.3.0 sqlline> !connect jdbc:ignite:thin://localhost Enter username for jdbc:ignite:thin://localhost: Enter password for jdbc:ignite:thin://localhost: 0: jdbc:ignite:thin://localhost> CREATE TABLE Person(ID INTEGER PRIMARY KEY, NAME VARCHAR(100), NAME . . . . . . . . . . . . . . . .> VARCHAR(100), AGE INTEGER (64)); No rows affected (0,229 seconds) 0: jdbc:ignite:thin://localhost> select * from person; ++++ | ID | NAME | AGE | ++++ ++++ No rows selected (0,073 seconds) {code} This is on master branch. "NAME VARCHAR(100)" twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12432) [Spark] Need to add test for AVG function in IgniteOptimizationAggregationFuncSpec
Alexey Zinoviev created IGNITE-12432: Summary: [Spark] Need to add test for AVG function in IgniteOptimizationAggregationFuncSpec Key: IGNITE-12432 URL: https://issues.apache.org/jira/browse/IGNITE-12432 Project: Ignite Issue Type: Test Components: spark Reporter: Alexey Zinoviev Assignee: Alexey Zinoviev Fix For: 2.9 The test is skipped with TODO: write me it("AVG - DECIMAL") { //TODO: write me } It should be merged to 2.3 and 2.4 Spark together -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Duplicate column name while creating table
It sounds like the implementation specificities of Ignite DML. SQL folks, how about throwing an exception in case of the duplicate name? - Denis On Thu, Dec 5, 2019 at 9:38 AM DS wrote: > *I am able to create the table with duplicate column name.* > > I was expecting some error saying "cannot create table; duplicate column > name: NAME" > Is there some reason that Ignite is not throwing error/exception OR > is it a bug? > > CREATE TABLE Person(ID INTEGER PRIMARY KEY, NAME VARCHAR(100), NAME > VARCHAR(100), AGE INTEGER (64)); > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >