Re: Indexing fragments of a column's value ?

2023-11-14 Thread dld



You can index on expressions, and these will be recognised by the query 
generator.


drop schema tmp CASCADE;
create schema tmp;
set search_path = tmp;

CREATE TABLE bagger
    ( eight CHAR(8) NOT NULL PRIMARY KEY
    , more text
    );
CREATE INDEX bagger_idx_12 ON bagger (substr(eight,1,2));
CREATE INDEX bagger_idx_34 ON bagger (substr(eight,3,2));
CREATE INDEX bagger_idx_58 ON bagger (substr(eight,5,4));

INSERT INTO bagger(eight, more)
SELECT translate(to_hex( gs), ' ' , '0')
    , gs::text
FROM generate_series(0,40, 64999) gs
    ;

VACUUM ANALYZE bagger;

EXPLAIN ANALYZE
SELECT * FROM bagger
WHERE 1=1
    -- AND eight >= '00' AND eight < '05'
    AND substr(eight, 1,2) >= '30' AND substr(eight,1,2) < '05'
    AND substr(eight, 3,2) >= '90' AND substr(eight,3,2) < 'A5'
    AND substr(eight, 5,4) >= '' AND substr(eight,5,4) < ''
    ;

/*** The optimiser is smart enough to ignore one of the indexes. */
QUERY PLAN
-
 Bitmap Heap Scan on bagger  (cost=41.46..59.62 rows=1 width=19) 
(actual time=0.049..0.056 rows=0 loops=1)
   Recheck Cond: ((substr((eight)::text, 1, 2) >= '30'::text) AND 
(substr((eight)::text, 1, 2) < '05'::text) AND (substr((eight)::text, 3, 
2) >= '90'::text) AND (substr((eight)::text, 3, 2) < 'A5'::text))
   Filter: ((substr((eight)::text, 5, 4) >= ''::text) AND 
(substr((eight)::text, 5, 4) < ''::text))
   ->  BitmapAnd  (cost=41.46..41.46 rows=16 width=0) (actual 
time=0.042..0.047 rows=0 loops=1)
 ->  Bitmap Index Scan on bagger_idx_12 (cost=0.00..4.47 
rows=308 width=0) (actual time=0.039..0.039 rows=0 loops=1)
   Index Cond: ((substr((eight)::text, 1, 2) >= '30'::text) 
AND (substr((eight)::text, 1, 2) < '05'::text))
 ->  Bitmap Index Scan on bagger_idx_34 (cost=0.00..36.74 
rows=3205 width=0) (never executed)
   Index Cond: ((substr((eight)::text, 3, 2) >= '90'::text) 
AND (substr((eight)::text, 3, 2) < 'A5'::text))

 Planning Time: 5.487 ms
 Execution Time: 0.310 ms
(10 rows)


HTH,

AvK





Re: Indexing fragments of a column's value ?

2023-11-13 Thread Kirk Wolak
On Fri, Nov 3, 2023 at 3:34 PM David Gauthier  wrote:

> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.  Example, a char(8) which
> contains all hex values (basically a hex number that's always 8 chars wide,
> leading zeros if needed). Someone might want to select all recs where the
> first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match
> '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".
>
> If I know the placement and width of the fields that need to be indexed,
> can indices be defined to facilitate queries ?  Example...
> - match all 8 chars
> - match the 3rd and 4th chars
> - match the last 4 chars
>
> I suppose I could fragment the thing into multiple columns for the
> purposes of a search.  So add a column called "last4" as a char(4) that
> matches the last 4 chars of that column.  Then index that.  etc... But
> inquiring to see if there is something more elegant.
>
> Just curious why not store it as a HEX string (functional index, even).
And allow searching against the stringified hex, then your regex works?

Kirk


Re: Indexing fragments of a column's value ?

2023-11-03 Thread Thomas Boussekey
Le ven. 3 nov. 2023 à 21:01, Tom Lane  a écrit :

> David Gauthier  writes:
> > I'm asking about the possibility of indexing portions of a column's value
> > where the column has a static field format.
>
> GIN indexes are meant for exactly that.  You might have to write your
> own opclass to break up the input values in the way you want though.
>
> A less difficult answer would be to write a function that breaks up
> the input into (say) an array of text and then use the existing
> GIN array support.  But you'd pay for that by needing to write more
> complicated queries to use the index.
>
> regards, tom lane
>
>
> Hello David,

Reading your mail, it seems that your data column contains 3 different
kinds of atomic information:

* Characters 1-2
* Characters 3-4
* Characters 5-8

Does it make sense to split this data into 3 separate columns?
Each one could be indexed, and you can rebuild the original thanks to a
generated column:
https://www.postgresql.org/docs/current/ddl-generated-columns.html

HTH, Thomas


Re: Indexing fragments of a column's value ?

2023-11-03 Thread Tom Lane
David Gauthier  writes:
> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.

GIN indexes are meant for exactly that.  You might have to write your
own opclass to break up the input values in the way you want though.

A less difficult answer would be to write a function that breaks up
the input into (say) an array of text and then use the existing
GIN array support.  But you'd pay for that by needing to write more
complicated queries to use the index.

regards, tom lane




Indexing fragments of a column's value ?

2023-11-03 Thread David Gauthier
I'm asking about the possibility of indexing portions of a column's value
where the column has a static field format.  Example, a char(8) which
contains all hex values (basically a hex number that's always 8 chars wide,
leading zeros if needed). Someone might want to select all recs where the
first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match
'00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".

If I know the placement and width of the fields that need to be indexed,
can indices be defined to facilitate queries ?  Example...
- match all 8 chars
- match the 3rd and 4th chars
- match the last 4 chars

I suppose I could fragment the thing into multiple columns for the purposes
of a search.  So add a column called "last4" as a char(4) that matches the
last 4 chars of that column.  Then index that.  etc... But inquiring to see
if there is something more elegant.

Thanks !