Re: [sqlite] [EXTERNAL] virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

2017-10-20 Thread dave
> Behalf Of Hick Gunter
> Sent: Friday, October 20, 2017 1:55 AM
>
> I can provide some info coming from our experience with SQLite 3.7.14:
> 
> Since most SQl processing is IO bound, the "estimated cost" 
> should be the number of disk IO operations required to 
> retrieve the rows. The later addition of "estimated rows" 
> reflects to the fact that some virtual table implementations 
> might use non-disk storage (e.g. process or shared memory), 
> where the number of IO operations is determined by the 
> resident set and the cost of paging/swapping.
> 
> Lets say you have 1 records of 200 bytes with 50 bytes of 
> key overhead stored in some kind of ISAM file, and a page size of 4k.
> 
> Performing a full table scan will take an estimated 1 * 
> 200 / 4096 ~= 489 disk accesses, whereas looking up a single 
> record will take about 3 (50 bytes per key in a 4096 byte 
> page gives an estimated fan out of over 100, resulting in 2 
> pages to read from the index and 1 for the record itself). 
> Performing a partial index scan that returns 100 records will 
> take 2 acesses to locate the first record, 1 more if a second 
> index page is required and anywhere between 5 (if the records 
> are contiguous) and 100 (if each is from a separate page) 
> accesses to retrieve the records themselves.
> 
> Regarding the UNIQUE flag, this is quite different from the 
> number of estimated rows, which may be 0 or 1 due to rounding 
> errors on a non-unique index (e.g. the initials of a set of 
> 100 people has a cardinality of 26*26=676, giving an average 
> number of 0,1479 records per index entry, but there may still 
> be duplicates).
 
Thanks so much, Hick, for the detailed info.

I guess I am still a little unclear about the importance of
SQLITE_INDEX_SCAN_UNIQUE, but I am interpreting your statement to mean
something like 'it is a more assertive statement about the number of rows
returned than the row estimate of an equal value', and that somehow guides
the query planner more strongly in some direction.  It also sounds like what
I was doing (described in my first message, here elided), was fine.

Thanks, and cheers!
-dave


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] virtual tables, and theTrue meaning of pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

2017-10-19 Thread Hick Gunter
I can provide some info coming from our experience with SQLite 3.7.14:

Since most SQl processing is IO bound, the "estimated cost" should be the 
number of disk IO operations required to retrieve the rows. The later addition 
of "estimated rows" reflects to the fact that some virtual table 
implementations might use non-disk storage (e.g. process or shared memory), 
where the number of IO operations is determined by the resident set and the 
cost of paging/swapping.

Lets say you have 1 records of 200 bytes with 50 bytes of key overhead 
stored in some kind of ISAM file, and a page size of 4k.

Performing a full table scan will take an estimated 1 * 200 / 4096 ~= 489 
disk accesses, whereas looking up a single record will take about 3 (50 bytes 
per key in a 4096 byte page gives an estimated fan out of over 100, resulting 
in 2 pages to read from the index and 1 for the record itself). Performing a 
partial index scan that returns 100 records will take 2 acesses to locate the 
first record, 1 more if a second index page is required and anywhere between 5 
(if the records are contiguous) and 100 (if each is from a separate page) 
accesses to retrieve the records themselves.

Regarding the UNIQUE flag, this is quite different from the number of estimated 
rows, which may be 0 or 1 due to rounding errors on a non-unique index (e.g. 
the initials of a set of 100 people has a cardinality of 26*26=676, giving an 
average number of 0,1479 records per index entry, but there may still be 
duplicates).

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von dave
Gesendet: Donnerstag, 19. Oktober 2017 18:43
An: 'SQLite mailing list' 
Betreff: [EXTERNAL] [sqlite] virtual tables, and theTrue meaning of 
pIdxInfo->estimatedCost, pIdxInfo->estimatedRows, and pIdxInfo->idxFlags...

Hi folks,

I am trying to fully understand the impact and correct use of a few subtle 
features related to virtual tables' the xBestIndex mechanism, and their correct 
use.  Here are my current beliefs:

*  pIdxInfo->estimatedCost
  obviously the cost of the proposed plan; a metric of the 'viscosity' of the 
table when traversing through xNext relative to other tables and especially to 
filesystem access
*  pIdxInfo->estimatedRows
  obviously the approximate number of rows that a proposed plan will return.
But less obvious to me is how this materially affects the query plan, 
especially relative to pIdxInfo->estimatedCost
  and a little bit with respect to:
* pIdxInfo->idxFlags
  when the SQLITE_INDEX_SCAN_UNIQUE is set.  Isn't setting
pIdxInfo->estimatedRows to 0 or 1 enough to communicate this same
information?

Anyway, I am dutifully setting both estimatedRows and idxFlags in cases where I 
have a 0-or-1-result table (I have several of these), and I am also 
estimatedRows to LLONG_MAX along with estimatedCost to DBL_MAX in cases where a 
plan can never be executed (btw I would respectfully suggest perhaps using a 
bit in idxFlags to communicate 'never use this plan, it will never work').

I haven't had any ill effects doing the above, but wonder if that is 'correct'. 
 Also, it would be interesting just to know what the material effect of 
estimatedRows and idxFlags is, so that I can maybe use them more effectively.

Any thoughts or corrections to my thinking?  Thanks in advance; cheers!
-dave
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users