Re: [sqlite] Why is this so much more efficient?
Aliases in SQL are not the same as variables in most procedural languages. So every time you mention 'totaltime' SQLite is probably recalculating that value by adding all the columns together. See the various discussions regarding no deterministic (random) functions last year. Less references to that alias => less calculations. Reference it twice and expect it to be slower. > On 17 Feb 2017, at 12:27 AM, Darko Volaric wrote: > > You can actually index functions or expression: > https://www.sqlite.org/expridx.html > > On Thu, Feb 16, 2017 at 9:32 PM, Cecil Westerhof > wrote: > >> 2017-02-16 21:10 GMT+01:00 Dominique Pellé : >> >>> Cecil Westerhof wrote: >>> I have a table vmstat that I use to store vmstat info. ;-) At the moment it has more as 661 thousand records. In principle the values of usertime, systemtime, idletime, waittime and stolentime should add up to 100. I just wanted to check it. Of-course >>> there could be a rounding error, so I wrote the following query: SELECT date , time , usertime , systemtime , idletime , waittime , stolentime , (usertime + systemtime + idletime + waittime + stolentime) AS totaltime FROM vmstat WHERE totaltime < 99 OR totaltime > 101 I did not like that, so I rewrote the WHERE to: WHERE ABS(100 - totaltime) > 1 The funny thing the second WHERE is more efficient as the first, where >> I would have expected it to be the other way around. The first takes around 1.050 milliseconds. The second takes around 950 milliseconds. So the second is around 10% more efficient. Why is this? In case it is important: I did this in sqlitebrowser 3.7.0, which uses SQLite 3.8.10.2. >>> >>> >>> I suspect that when you use "WHERE ABS(100 - totaltime) > 1" >>> SQLite cannot use an index since it's an expression, so it does a >>> full table scan, whereas the other solution which does >>> "WHERE totaltime < 99 OR totaltime > 101" >>> may use an index on totaltime (assuming that there is an index). >>> >>> In general using an index is good. But if most of the records >>> satisfy the condition "ABS(100 - totaltime) > 1" then an index >>> can be more harmful than useful. And that could explain >>> why using "WHERE ABS(100 - totaltime) > 1" is a bit faster. >>> You could try "EXPLAIN QUERY PLAN" on your queries to >>> see if they use an index or if they do a full table scan. >> >> Totaltime is calculated, so it cannot have an index. ;-) >> Besides from the almost 700.000 records only two satisfy the condition. >> >> I should look into EXPLAIN QUERY PLAN. >> >> -- >> Cecil Westerhof >> ___ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
You can actually index functions or expression: https://www.sqlite.org/expridx.html On Thu, Feb 16, 2017 at 9:32 PM, Cecil Westerhof wrote: > 2017-02-16 21:10 GMT+01:00 Dominique Pellé : > > > Cecil Westerhof wrote: > > > > > I have a table vmstat that I use to store vmstat info. ;-) > > > At the moment it has more as 661 thousand records. > > > > > > In principle the values of usertime, systemtime, idletime, waittime and > > > stolentime should add up to 100. I just wanted to check it. Of-course > > there > > > could be a rounding error, so I wrote the following query: > > > SELECT date > > > , time > > > , usertime > > > , systemtime > > > , idletime > > > , waittime > > > , stolentime > > > , (usertime + systemtime + idletime + waittime + stolentime) AS > > > totaltime > > > FROM vmstat > > > WHERE totaltime < 99 OR totaltime > 101 > > > > > > I did not like that, so I rewrote the WHERE to: > > > WHERE ABS(100 - totaltime) > 1 > > > > > > The funny thing the second WHERE is more efficient as the first, where > I > > > would have expected it to be the other way around. > > > The first takes around 1.050 milliseconds. > > > The second takes around 950 milliseconds. > > > So the second is around 10% more efficient. Why is this? > > > > > > In case it is important: I did this in sqlitebrowser 3.7.0, which uses > > > SQLite 3.8.10.2. > > > > > > I suspect that when you use "WHERE ABS(100 - totaltime) > 1" > > SQLite cannot use an index since it's an expression, so it does a > > full table scan, whereas the other solution which does > > "WHERE totaltime < 99 OR totaltime > 101" > > may use an index on totaltime (assuming that there is an index). > > > > In general using an index is good. But if most of the records > > satisfy the condition "ABS(100 - totaltime) > 1" then an index > > can be more harmful than useful. And that could explain > > why using "WHERE ABS(100 - totaltime) > 1" is a bit faster. > > You could try "EXPLAIN QUERY PLAN" on your queries to > > see if they use an index or if they do a full table scan. > > > > Totaltime is calculated, so it cannot have an index. ;-) > Besides from the almost 700.000 records only two satisfy the condition. > > I should look into EXPLAIN QUERY PLAN. > > -- > Cecil Westerhof > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
2017-02-16 21:10 GMT+01:00 Dominique Pellé : > Cecil Westerhof wrote: > > > I have a table vmstat that I use to store vmstat info. ;-) > > At the moment it has more as 661 thousand records. > > > > In principle the values of usertime, systemtime, idletime, waittime and > > stolentime should add up to 100. I just wanted to check it. Of-course > there > > could be a rounding error, so I wrote the following query: > > SELECT date > > , time > > , usertime > > , systemtime > > , idletime > > , waittime > > , stolentime > > , (usertime + systemtime + idletime + waittime + stolentime) AS > > totaltime > > FROM vmstat > > WHERE totaltime < 99 OR totaltime > 101 > > > > I did not like that, so I rewrote the WHERE to: > > WHERE ABS(100 - totaltime) > 1 > > > > The funny thing the second WHERE is more efficient as the first, where I > > would have expected it to be the other way around. > > The first takes around 1.050 milliseconds. > > The second takes around 950 milliseconds. > > So the second is around 10% more efficient. Why is this? > > > > In case it is important: I did this in sqlitebrowser 3.7.0, which uses > > SQLite 3.8.10.2. > > > I suspect that when you use "WHERE ABS(100 - totaltime) > 1" > SQLite cannot use an index since it's an expression, so it does a > full table scan, whereas the other solution which does > "WHERE totaltime < 99 OR totaltime > 101" > may use an index on totaltime (assuming that there is an index). > > In general using an index is good. But if most of the records > satisfy the condition "ABS(100 - totaltime) > 1" then an index > can be more harmful than useful. And that could explain > why using "WHERE ABS(100 - totaltime) > 1" is a bit faster. > You could try "EXPLAIN QUERY PLAN" on your queries to > see if they use an index or if they do a full table scan. > Totaltime is calculated, so it cannot have an index. ;-) Besides from the almost 700.000 records only two satisfy the condition. I should look into EXPLAIN QUERY PLAN. -- Cecil Westerhof ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
Cecil Westerhof wrote: > I have a table vmstat that I use to store vmstat info. ;-) > At the moment it has more as 661 thousand records. > > In principle the values of usertime, systemtime, idletime, waittime and > stolentime should add up to 100. I just wanted to check it. Of-course there > could be a rounding error, so I wrote the following query: > SELECT date > , time > , usertime > , systemtime > , idletime > , waittime > , stolentime > , (usertime + systemtime + idletime + waittime + stolentime) AS > totaltime > FROM vmstat > WHERE totaltime < 99 OR totaltime > 101 > > I did not like that, so I rewrote the WHERE to: > WHERE ABS(100 - totaltime) > 1 > > The funny thing the second WHERE is more efficient as the first, where I > would have expected it to be the other way around. > The first takes around 1.050 milliseconds. > The second takes around 950 milliseconds. > So the second is around 10% more efficient. Why is this? > > In case it is important: I did this in sqlitebrowser 3.7.0, which uses > SQLite 3.8.10.2. I suspect that when you use "WHERE ABS(100 - totaltime) > 1" SQLite cannot use an index since it's an expression, so it does a full table scan, whereas the other solution which does "WHERE totaltime < 99 OR totaltime > 101" may use an index on totaltime (assuming that there is an index). In general using an index is good. But if most of the records satisfy the condition "ABS(100 - totaltime) > 1" then an index can be more harmful than useful. And that could explain why using "WHERE ABS(100 - totaltime) > 1" is a bit faster. You could try "EXPLAIN QUERY PLAN" on your queries to see if they use an index or if they do a full table scan. Dominique ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
Basically, the DB size and compactness state influences the speed at which values are read. (Every update implies a read). Add to that the fact that these functions verge on the bottom edge of time consumers... It's like testing diffusion speed of a perfume in a hurricane. Glad you found the close competition between most-optimal solutions. On 2017/02/15 8:43 PM, Cecil Westerhof wrote: 2017-02-15 14:18 GMT+01:00 Cecil Westerhof : The OR version is the least efficient and it look likes the BETWEEN version 2 is the most efficient. It looks like it uses less user and more sys. Which is the most efficient is also dependent on the state of the database itself. I compacted the database and now sometimes NOT BETWEEN is more efficient and sometimes NOT BETWEEN version 2 is more efficient. The OR version is always the least efficient. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
2017-02-15 14:18 GMT+01:00 Cecil Westerhof : > The OR version is the least efficient and it look likes the BETWEEN > version 2 is the most efficient. It looks like it uses less user and more > sys. > Which is the most efficient is also dependent on the state of the database itself. I compacted the database and now sometimes NOT BETWEEN is more efficient and sometimes NOT BETWEEN version 2 is more efficient. The OR version is always the least efficient. -- Cecil Westerhof ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
2017-02-15 13:40 GMT+01:00 Cecil Westerhof : > I wrote the following Bash script: > I wrote a better one. See end of post. Inprinciple you are only interested in the totals. I also changed the runs from 10 to 25. The OR version is the least efficient and it look likes the BETWEEN version 2 is the most efficient. It looks like it uses less user and more sys. The results of two runs: Timing OR version real: 7.345 user: 6.688 sys: 0.648 Timing ABS version real: 6.403 user: 5.56 sys: 0.84 Timing NOT BETWEEN version real: 6.382 user: 5.62 sys: 0.752 Timing NOT BETWEEN version 2 real: 6.291 user: 5.488 sys: 0.788 and: Timing OR version real: 7.228 user: 6.496 sys: 0.728 Timing ABS version real: 6.647 user: 5.904 sys: 0.74 Timing NOT BETWEEN version real: 6.431 user: 5.688 sys: 0.736 Timing NOT BETWEEN version 2 real: 6.263 user: 5.492 sys: 0.756 The script: #!/usr/bin/env bash set -o errexit set -o nounset declare -r COUNT=25 declare -r DISPLAY_ALL=F declare -r SELECT=' SELECT date , time , usertime , systemtime , idletime , waittime , stolentime , (usertime + systemtime + idletime + waittime + stolentime) AS totaltime FROM vmstat ' declare -r ABS_VERSION=" ${SELECT} WHERE ABS(100 - totaltime) > 1 ;" declare -r OR_VERSION=" ${SELECT} WHERE totaltime < 99 OR totaltime > 101 ;" declare -r NOT_BETWEEN_VERSION=" ${SELECT} WHERE totaltime NOT BETWEEN 99 AND 101 ;" declare -r NOT_BETWEEN_VERSION2=" ${SELECT} WHERE NOT (totaltime BETWEEN 99 AND 101) ;" function timeQuery { for i in $(seq ${COUNT}) ; do sqlite3 ~/Databases/general.sqlite
Re: [sqlite] Why is this so much more efficient?
2017-02-15 12:02 GMT+01:00 R Smith : > Note however that this may not be entirely true. The Query might read data > from the disk cache (or several memory caches may be in play) during the > second run. Run each statement many times, and compare average return times. > > Also try this: > ... WHERE totaltime NOT BETWEEN 99 AND 101; > or > ... WHERE NOT (totaltime BETWEEN 99 AND 101); > I wrote the following Bash script: #!/usr/bin/env bash set -o errexit set -o nounset declare -r COUNT=10 declare -r SELECT=' SELECT date , time , usertime , systemtime , idletime , waittime , stolentime , (usertime + systemtime + idletime + waittime + stolentime) AS totaltime FROM vmstat ' declare -r ABS_VERSION=" ${SELECT} WHERE ABS(100 - totaltime) > 1 ;" declare -r OR_VERSION=" ${SELECT} WHERE totaltime < 99 OR totaltime > 101 ;" declare -r NOT_BETWEEN_VERSION=" ${SELECT} WHERE totaltime NOT BETWEEN 99 AND 101 ;" declare -r NOT_BETWEEN_VERSION2=" ${SELECT} WHERE NOT (totaltime BETWEEN 99 AND 101) ;" function timeQuery { for i in $(seq ${COUNT}) ; do sqlite3 ~/Databases/general.sqlite
Re: [sqlite] Why is this so much more efficient?
2017-02-15 12:02 GMT+01:00 R Smith : > > On 2017/02/15 12:33 PM, Cecil Westerhof wrote: > >> I have a table vmstat that I use to store vmstat info. ;-) >> At the moment it has more as 661 thousand records. >> >> In principle the values of usertime, systemtime, idletime, waittime and >> stolentime should add up to 100. I just wanted to check it. Of-course >> there >> could be a rounding error, so I wrote the following query: >> SELECT date >> , time >> , usertime >> , systemtime >> , idletime >> , waittime >> , stolentime >> , (usertime + systemtime + idletime + waittime + stolentime) AS >> totaltime >> FROM vmstat >> WHERE totaltime < 99 OR totaltime > 101 >> >> I did not like that, so I rewrote the WHERE to: >> WHERE ABS(100 - totaltime) > 1 >> >> The funny thing the second WHERE is more efficient as the first, where I >> would have expected it to be the other way around. >> The first takes around 1.050 milliseconds. >> The second takes around 950 milliseconds. >> So the second is around 10% more efficient. Why is this? >> > > That's because the first one executes 2 checks mostly - it first checks to > see if totaltime < 99, if so then it returns true, if not, then a second > comparison has to be done... so 2 comparison functions on many items. > > The second check involves a single calculation and comparison - so what > you have deduced is that the "minus" function is slightly more efficient > than occasional extra comparison function. > Minus function and abs function. > Note however that this may not be entirely true. The Query might read data > from the disk cache (or several memory caches may be in play) during the > second run. Run each statement many times, and compare average return times. > I did. Not very much. But maybe I should try it a bit more often. Also try this: > ... WHERE totaltime NOT BETWEEN 99 AND 101; > or > ... WHERE NOT (totaltime BETWEEN 99 AND 101); > I will try those also. -- Cecil Westerhof ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why is this so much more efficient?
On 2017/02/15 12:33 PM, Cecil Westerhof wrote: I have a table vmstat that I use to store vmstat info. ;-) At the moment it has more as 661 thousand records. In principle the values of usertime, systemtime, idletime, waittime and stolentime should add up to 100. I just wanted to check it. Of-course there could be a rounding error, so I wrote the following query: SELECT date , time , usertime , systemtime , idletime , waittime , stolentime , (usertime + systemtime + idletime + waittime + stolentime) AS totaltime FROM vmstat WHERE totaltime < 99 OR totaltime > 101 I did not like that, so I rewrote the WHERE to: WHERE ABS(100 - totaltime) > 1 The funny thing the second WHERE is more efficient as the first, where I would have expected it to be the other way around. The first takes around 1.050 milliseconds. The second takes around 950 milliseconds. So the second is around 10% more efficient. Why is this? That's because the first one executes 2 checks mostly - it first checks to see if totaltime < 99, if so then it returns true, if not, then a second comparison has to be done... so 2 comparison functions on many items. The second check involves a single calculation and comparison - so what you have deduced is that the "minus" function is slightly more efficient than occasional extra comparison function. Note however that this may not be entirely true. The Query might read data from the disk cache (or several memory caches may be in play) during the second run. Run each statement many times, and compare average return times. Also try this: ... WHERE totaltime NOT BETWEEN 99 AND 101; or ... WHERE NOT (totaltime BETWEEN 99 AND 101); Cheers, Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Why is this so much more efficient?
I have a table vmstat that I use to store vmstat info. ;-) At the moment it has more as 661 thousand records. In principle the values of usertime, systemtime, idletime, waittime and stolentime should add up to 100. I just wanted to check it. Of-course there could be a rounding error, so I wrote the following query: SELECT date , time , usertime , systemtime , idletime , waittime , stolentime , (usertime + systemtime + idletime + waittime + stolentime) AS totaltime FROM vmstat WHERE totaltime < 99 OR totaltime > 101 I did not like that, so I rewrote the WHERE to: WHERE ABS(100 - totaltime) > 1 The funny thing the second WHERE is more efficient as the first, where I would have expected it to be the other way around. The first takes around 1.050 milliseconds. The second takes around 950 milliseconds. So the second is around 10% more efficient. Why is this? In case it is important: I did this in sqlitebrowser 3.7.0, which uses SQLite 3.8.10.2. -- Cecil Westerhof ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users