On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
<[email protected]> wrote:>
> On 29.06.2019 1:23, Julien Rouhaud wrote:
>
> But that kinda resembles stuff we already have - selectivity/cost. So
> why shouldn't this be considered as part of costing?
>
> Yeah, I'm not entirely convinced that we need anything new here.
> The cost estimate function can detect such situations, and so can
> the index AM at scan start --- for example, btree checks for
> contradictory quals at scan start. There's a certain amount of
> duplicative effort involved there perhaps, but you also have to
> keep in mind that we don't know the values of run-time-determined
> comparison values until scan start. So if you want certainty rather
> than just a cost estimate, you may have to do these sorts of checks
> at scan start.
>
> Ah, I didn't know about _bt_preprocess_keys(). I'm not familiar with
> this code, so please bear with me. IIUC the idea would be to add
> additional logic in gingetbitmap() / ginNewScanKey() to drop some
> quals at runtime. But that would mean that additional logic would
> also be required in BitmapHeapScan, or that all the returned bitmap
> should be artificially marked as lossy to enforce a recheck?
>
> We have a similar solution for this problem. The idea is to avoid full index
> scan inside GIN itself when we have some GIN entries, and forcibly recheck
> all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
> GIN entries.
Thanks for looking at it. That's I think a way better approach.
> The attached patch in its current shape contain at least two ugly places:
>
> 1. We still need to initialize empty scan key to call triconsistent(), but
> then we have to remove it from the list of scan keys. Simple refactoring
> of ginFillScanKey() can be helpful here.
>
> 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
> if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
> because we need to skip NULLs in GIN_SEARCH_MODE_ALL. Simplest example
> here
> is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not
> forced,
> and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked. Maybe
> it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.
Also
+ if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0)
+ {
+ /*
+ * Don't emit ALL key with no entries, check only whether
+ * unconditional recheck is needed.
+ */
+ GinScanKey key = &so->keys[--so->nkeys];
+
+ hasSearchAllMode = true;
+ so->forcedRecheck = key->triConsistentFn(key) != GIN_TRUE;
+ }
Shouldn't you make sure that the forcedRecheck flag can't reset?
> -- patched
> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on test (cost=20.43..176.79 rows=42 width=6) (actual
> time=0.287..0.424 rows=300 loops=1)
> Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> Rows Removed by Index Recheck: 2
> Heap Blocks: exact=114
> -> Bitmap Index Scan on test_t_idx (cost=0.00..20.42 rows=42 width=0)
> (actual time=0.271..0.271 rows=302 loops=1)
> Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> Planning Time: 0.080 ms
> Execution Time: 0.450 ms
> (8 rows)
One thing that's bothering me is that the explain implies that the
LIKE '%i% was part of the index scan, while in reality it wasn't. One
of the reason why I tried to modify the qual while generating the path
was to have the explain be clearer about what is really done.