[jira] [Commented] (IGNITE-7167) Optimize 'select count(*) from Table'

2018-02-13 Thread Vladimir Ozerov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362003#comment-16362003
 ] 

Vladimir Ozerov commented on IGNITE-7167:
-

[~vkulichenko],
Regarding MVCC - when doing COUNT(*) you should count only elements visible to 
your MVCC counter. The only way to achieve this is counting elements 
one-by-one, filtering out the following entries:
1) Entries for not-yet committed transactions
2) Entries for aborted transactions
3) Entries for newer committed transactions which are not visible to current 
transaction

Certain optimizations exist, such as aggregating visibility info on per-block 
level, but in general case we still resort to a kind of iteration over some 
elements (tuple or block), rather than reading a single number.

NB: When MVCC is enabled {{IgniteCache.size()}} would also likely be O(N) 
operation rather than O(1).

> Optimize 'select count(*) from Table'
> -
>
> Key: IGNITE-7167
> URL: https://issues.apache.org/jira/browse/IGNITE-7167
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 2.3
>Reporter: Valentin Kulichenko
>Priority: Major
>
> Currently query like {{select count(*) from Table}} effectively scans the 
> cache and take a lot of time for large datasets. Probably makes sense to 
> optimize it to use {{IgniteCache#size}} directly when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7167) Optimize 'select count(*) from Table'

2018-02-12 Thread Valentin Kulichenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361366#comment-16361366
 ] 

Valentin Kulichenko commented on IGNITE-7167:
-

[~vozerov],
 # But we're going to disallow this eventually, right? For now, is it possible 
to track this somehow and use {{size()}} when possible? Having several types in 
a cache is very rare now, so doing scan in this case is OK.
 # Can you elaborate on this? What exactly doesn't work?

In general, if {{size()}} is not an appropriate solution, maybe there is 
another one? From my expirience with other DBs, this query never takes a lot of 
time even for large tables. And this seems to cause confusion for our users as 
well. Would be great if we come up with something here, even if not right now.

> Optimize 'select count(*) from Table'
> -
>
> Key: IGNITE-7167
> URL: https://issues.apache.org/jira/browse/IGNITE-7167
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 2.3
>Reporter: Valentin Kulichenko
>Priority: Major
>
> Currently query like {{select count(*) from Table}} effectively scans the 
> cache and take a lot of time for large datasets. Probably makes sense to 
> optimize it to use {{IgniteCache#size}} directly when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7167) Optimize 'select count(*) from Table'

2018-02-12 Thread Vladimir Ozerov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360409#comment-16360409
 ] 

Vladimir Ozerov commented on IGNITE-7167:
-

[~vkulichenko], {{IgniteCache.size}} is not appropriate for two reasons:
1) We still allow for multiple tables in the same cache, so {{size()}} will 
return number of entries from all tables
2) It doesn't work for MVCC case

> Optimize 'select count(*) from Table'
> -
>
> Key: IGNITE-7167
> URL: https://issues.apache.org/jira/browse/IGNITE-7167
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 2.3
>Reporter: Valentin Kulichenko
>Priority: Major
>
> Currently query like {{select count(*) from Table}} effectively scans the 
> cache and take a lot of time for large datasets. Probably makes sense to 
> optimize it to use {{IgniteCache#size}} directly when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7167) Optimize 'select count(*) from Table'

2018-02-06 Thread Valentin Kulichenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354756#comment-16354756
 ] 

Valentin Kulichenko commented on IGNITE-7167:
-

[~vozerov], I actually meant the specific case when query exactly like in the 
title is executed, without any WHERE clauses or anything else. Why can't we 
just call {{IgniteCache#size}} in this case?

> Optimize 'select count(*) from Table'
> -
>
> Key: IGNITE-7167
> URL: https://issues.apache.org/jira/browse/IGNITE-7167
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 2.3
>Reporter: Valentin Kulichenko
>Priority: Major
>
> Currently query like {{select count(*) from Table}} effectively scans the 
> cache and take a lot of time for large datasets. Probably makes sense to 
> optimize it to use {{IgniteCache#size}} directly when possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)