[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-11-24 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: performance  (was: iep-1 performance)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-11-24 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: performance  (was: iep-1 performance)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: iep-1, performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-11-24 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: iep-1 performance  (was: performance)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: iep-1, performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-10-12 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: performance  (was: iep-1 performance)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: iep-1, performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-10-12 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: iep-1 performance  (was: performance)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: iep-1, performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-09-16 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: iep-1 performance  (was: performance)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: iep-1, performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-08-22 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Fix Version/s: (was: 2.2)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: performance
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-08-22 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Issue Type: Task  (was: Improvement)

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Task
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: performance
> Fix For: 2.2
>
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6057) SQL: Full scan should be performed through data pages bypassing primary index

2017-08-14 Thread Vladimir Ozerov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Ozerov updated IGNITE-6057:

Labels: performance  (was: )

> SQL: Full scan should be performed through data pages bypassing primary index
> -
>
> Key: IGNITE-6057
> URL: https://issues.apache.org/jira/browse/IGNITE-6057
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence, sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>  Labels: performance
> Fix For: 2.2
>
>
> Currently both SQL full scan and {{CREATE INDEX}} commands iterate through 
> primary index to get all existing values. Consider that we have 10 entries 
> per data page on average. In this case we will have to read the same data 
> page 10 times when reaching relevant keys in different parts of index tree. 
> This could be very inefficient on certain workloads.
> We should iterate over data pages directly instead. This way a page with 10 
> entries will be accessed only once. However, we should take cache groups in 
> count - if there are too many entries from other logical caches, this 
> approach could make situation even worse, unless we have a mechanism to skip 
> unnecessary entries (or the whole pages!) efficiently.
> Probably we should develop a cost-based model, which will take in count the 
> following statistics:
> 1) Average entry size. The longer the entry, the lesser the benefit. 
> Especially if overflow pages are used frequently. 
> 2) Cache groups. Ideally, we should estimate number of entries from all 
> logical caches. The more entries from other caches, the lesser the benefit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)