Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Thu, 18 Apr 2019 at 09:48, Hick Gunter wrote: > So the QP is attemopting to determine which cost product is less: > > My guess is it will probably choose the undesired, invalid plan. A linear > cost penalty for desired but optional inputs is probably not going to cut > it. In this case, the returned cost needs to be greater than 4 (more if > there are more records in the IN set). I feel there needs to be a > multiplicative penalty of at least 10. > > > As I mentioned in other post, the direct test with my current implementation and IN operator gave a good result, but I understand this can't be a rule. I see your points about cost adjustments so will give them a try. I also decided to look at the sources, estimatedCost as usable input only mentioned once, it goes to rRun field being translated with sqlite3LogEstFromDouble call. The LogEst value is nicely explained in the sources (below is the compacted version of the comment in the sources) Estimated quantities used for query planning are stored as 16-bit > logarithms. For quantity X, the value stored is 10*log2(X). This gives a > possible range of values of approximately 1.0e986 to 1e-986. But the > allowed values are "grainy". Not every value is representable. For > example, quantities 16 and 17 are both represented by a LogEst of 40. > However, since LogEst quantities are suppose to be estimates, not exact > values, this imprecision is not a problem. > "LogEst" is short for "Logarithmic Estimate". > Examples: > 1 -> 0 20 -> 43 1 -> 132 > 2 -> 10 25 -> 46 25000 -> 146 > 3 -> 16100 -> 66100 -> 199 > 4 -> 20 1000 -> 991048576 -> 200 > 10 -> 33 1024 -> 1004294967296 -> 320 > The LogEst can be negative to indicate fractional values. > Examples: > 0.5 -> -10 0.1 -> -330.0625 -> -40 > It seems that as a general rule it's not good to have close estimatedCost values since they will sometimes yield the same LogEst values. rRun evaluations in the sources are too complex to analyze Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
So the QP is attemopting to determine which cost product is less: Outer loop IN table (2 records, assuming cost 2) * Inner loop VT (2) = 4 Outer loop VT (3) * inner loop IN lookup (1 record, assuming cost 1) = 3 My guess is it will probably choose the undesired, invalid plan. A linear cost penalty for desired but optional inputs is probably not going to cut it. In this case, the returned cost needs to be greater than 4 (more if there are more records in the IN set). I feel there needs to be a multiplicative penalty of at least 10. Also, you could check the colUsed mask to check if the name_pattern field is referenced and react accordingly. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Mittwoch, 17. April 2019 14:56 An: SQLite mailing list Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables On Wed, 17 Apr 2019 at 15:08, Hick Gunter wrote: > ... > > SELECT contents from textfiles( 'mypath', NULL, 0); SELECT contents > from textfiles where search_path = 'mypath' and is_recursive = 1; > SLEECT contents from textfiles where search_path = 'mypath' and > name_pattern IN ('*.txt','*.csv'); > > With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) > respectively. > > And you want to know how to make SQLite always call xFilter with > ('mypath','*.txt') and ('mypath','*.csv') instead of just once with > only > ('mypath') and attempting to retrieve the undefined/empty > name_pattern. Is this correct? > > > Yes, you're correct. The name_pattern is also a good illustrative extension to the case. And indeed, IN operator is really a tricky case ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Wed, 17 Apr 2019 at 15:08, Hick Gunter wrote: > > SLEECT contents from textfiles where search_path = 'mypath' and > name_pattern IN ('*.txt','*.csv'); > > It's interesting, I implemented the mask and decided to give such a query a try (having the same cost adjust I explained in the first post). With newest versions and even older 3.15.1, it works ok with two cursors one after another (two xOpen calls) providing constraints with two masks. Version 3.8.3 visits constraints with optional parameters, but chooses a different route without filemask in constraint (so the tables outputs with a default mask). 3.6.10 both doesn't visit constraints with optional parameters and also gives results with default mask. Looks like similar to what you mentioned in your first reply. The usage of IN allows many expressive applications to such queries, but I should probably have in mind that sometimes it is safe to implement a data piece in a single entity so file mask may acquire something like comma-list (or semocolon) "*.txt;*.csv" Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Wed, 17 Apr 2019 at 15:08, Hick Gunter wrote: > ... > > SELECT contents from textfiles( 'mypath', NULL, 0); > SELECT contents from textfiles where search_path = 'mypath' and > is_recursive = 1; > SLEECT contents from textfiles where search_path = 'mypath' and > name_pattern IN ('*.txt','*.csv'); > > With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) > respectively. > > And you want to know how to make SQLite always call xFilter with > ('mypath','*.txt') and ('mypath','*.csv') instead of just once with only > ('mypath') and attempting to retrieve the undefined/empty name_pattern. Is > this correct? > > > Yes, you're correct. The name_pattern is also a good illustrative extension to the case. And indeed, IN operator is really a tricky case ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
Ok so lets assume your xConnect function declares something similar to Create table textfiles( result_path text, result_name text, contents blob, search_path text hidden, name_pattern text hidden, is_recursive hidden); So that you can SELECT contents from textfiles( 'mypath', NULL, 0); SELECT contents from textfiles where search_path = 'mypath' and is_recursive = 1; SLEECT contents from textfiles where search_path = 'mypath' and name_pattern IN ('*.txt','*.csv'); With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) respectively. And you want to know how to make SQLite always call xFilter with ('mypath','*.txt') and ('mypath','*.csv') instead of just once with only ('mypath') and attempting to retrieve the undefined/empty name_pattern. Is this correct? -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Mittwoch, 17. April 2019 13:03 An: SQLite mailing list Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables On Wed, 17 Apr 2019 at 12:52, Hick Gunter wrote: > Your xBestIndex function should be returning a cost that is > proportional to the "effort required to fulfill the query". My own VT > implementations have been returning the total number of records for > queries with no constraints and assuming a constant fan-out factor for > each key field provided as a constraint (eg. 25 for a full table scan, > 5 if only the first of two key fields is provided and 1 if both are provided). > I suspect that you talk more about tables that possess and outputs data regardless of "incoming" parameters involved. I understand that in this case it is mostly about performance-wise issues when the worst-case scenario is a time penalty but still valid results. I'm here more about table-valued tables when for some of them not providing incoming parameters means not valid data at all. One of my examples is a virtual table outputting text files from a folder. Folder is a required parameter, but there's also a binary flag regarding whether the scan processes subfolders. If the flag is provided, the implementation wants it to be used in the constraint chosen. Otherwise the intention from the query will not be transferred and the scan will be folder-recursive despite the fact that the flag disabling it appeared in the query. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Wed, 17 Apr 2019 at 12:52, Hick Gunter wrote: > Your xBestIndex function should be returning a cost that is proportional > to the "effort required to fulfill the query". My own VT implementations > have been returning the total number of records for queries with no > constraints and assuming a constant fan-out factor for each key field > provided as a constraint (eg. 25 for a full table scan, 5 if only the first > of two key fields is provided and 1 if both are provided). > I suspect that you talk more about tables that possess and outputs data regardless of "incoming" parameters involved. I understand that in this case it is mostly about performance-wise issues when the worst-case scenario is a time penalty but still valid results. I'm here more about table-valued tables when for some of them not providing incoming parameters means not valid data at all. One of my examples is a virtual table outputting text files from a folder. Folder is a required parameter, but there's also a binary flag regarding whether the scan processes subfolders. If the flag is provided, the implementation wants it to be used in the constraint chosen. Otherwise the intention from the query will not be transferred and the scan will be folder-recursive despite the fact that the flag disabling it appeared in the query. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
Your xBestIndex function should be returning a cost that is proportional to the "effort required to fulfill the query". My own VT implementations have been returning the total number of records for queries with no constraints and assuming a constant fan-out factor for each key field provided as a constraint (eg. 25 for a full table scan, 5 if only the first of two key fields is provided and 1 if both are provided). Newer releases of SQLite handle IN expressions differently. Older releases simply created an indexed ephemeral table and an Affinity opcode. Newer releases call xBestIndex twice; once with each IN transformed into an equality constraint, and once with (all of) these constraints disabled. Depending on the cost, the non-indexed ephemeral table would be placed on either side of the JOIN operation. I found it was necessary to increase the cost by at least 2.5% to discourage SQLite from putting the ephemeral table on the LHS (outer loop) of the join. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Mittwoch, 17. April 2019 10:44 An: SQLite mailing list Betreff: [EXTERNAL] [sqlite] "Optional" incoming parameters for virtual tables Hi, I'm trying to implement "optional incoming" parameters for my virtual tables. It's when there are columns required, but also there are some fine-tuning columns that should be noticed when provided in the query and assumed some default if not. The system that I tried seems to work, but I would like an opinion whether it is future-proof. - When the required parameters are not provided, xBestIndex returns a large cost for earlier sqlite version and additionallly SQLITE_CONSTRAINT for 3.26.0 onward. - When the required parameters provided and all optional provided the cost is 1. - For every optional parameter not provided, the cost is increased by 1. So if there are 3 optional parameters and no provided, the cost is 4, if one optional provided - the cost is 2. Should this always work as expected or sqlite might not always choose the lowest cost index amongst the ones with small values? I tested it with a couple of vt implementations, but the number of columns in both required/optional pool are low and I expect some trouble when the number will go up. Thanks, Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users