Re: Crash in BRIN minmax-multi indexes

2022-12-30 Thread Tomas Vondra
I finally pushed this fix.

In the end I both relaxed the assert a little bit to allow calling
build_distances for a single range, and added a bail out so that the
caller gets regular NULL and not whatever palloc(0) produces.

Thanks again for the report!


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2022-11-10 Thread Tomas Vondra
On 11/9/22 00:13, Justin Pryzby wrote:
> On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
>> I'll get this fixed in a couple days. Considering the benign nature of
>> this issue (unnecessary assert) I'm not going to rush.
> 
> Is this still an outstanding issue ?
> 

Yes. I'll get it fixed, but it's harmless in practice (without asserts),
and I've been focusing on the other issue with broken NULL-handling in
BRIN indexes.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2022-11-08 Thread Justin Pryzby
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
> I'll get this fixed in a couple days. Considering the benign nature of
> this issue (unnecessary assert) I'm not going to rush.

Is this still an outstanding issue ?

-- 
Justin




Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Jaime Casanova
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
> On 10/3/22 21:25, Jaime Casanova wrote:
> > On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
> >> On 9/29/22 08:53, Jaime Casanova wrote:
> >>> ...
> >>>
> >>> Just found one more ocurrance of this one with this index while an
> >>> autovacuum was running:
> >>>
> >>> """
> >>> CREATE INDEX bt_f8_heap_seqno_idx 
> >>> ON public.bt_f8_heap 
> >>> USING brin (seqno float8_minmax_multi_ops);
> >>> """
> >>> Attached is a backtrace.
> >>
> >> Thanks for the report!
> >>
> >> I think I see the issue - brin_minmax_multi_union does not realize the
> >> two summaries could have just one range each, and those can overlap so
> >> that merge_overlapping_ranges combines them into a single one.
> >>
> >> This is harmless, except that the assert int build_distances is overly
> >> strict. Not sure if we should just remove the assert or only compute the
> >> distances with (neranges>1).
> >>
> >> Do you happen to have the core dump? It'd be useful to look at ranges_a
> >> and ranges_b, to confirm this is indeed what's happening.
> >>
> > 
> > I do have it.
> > 
> > (gdb) p *ranges_a
> > $4 = {
> >   typid = 701,
> >   colloid = 0,
> >   attno = 0,
> >   cmp = 0x0,
> >   nranges = 0,
> >   nsorted = 1,
> >   nvalues = 1,
> >   maxvalues = 32,
> >   target_maxvalues = 32,
> >   values = 0x55d2ea1987c8
> > }
> > (gdb) p *ranges_b
> > $5 = {
> >   typid = 701,
> >   colloid = 0,
> >   attno = 0,
> >   cmp = 0x0,
> >   nranges = 0,
> >   nsorted = 1,
> >   nvalues = 1,
> >   maxvalues = 32,
> >   target_maxvalues = 32,
> >   values = 0x55d2ea196da8
> > }
> > 
> 
> Thanks. That mostly confirms my theory. I'd bet that this
> 
> (gdb) p ranges_a->values[0]
> (gdb) p ranges_b->values[0]
> 
> will print the same value. 
> 

you're right, they are same value

(gdb) p ranges_a->values[0]
$1 = 4679532294229561068
(gdb) p ranges_b->values[0]
$2 = 4679532294229561068

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Tomas Vondra
On 10/3/22 21:25, Jaime Casanova wrote:
> On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
>> On 9/29/22 08:53, Jaime Casanova wrote:
>>> ...
>>>
>>> Just found one more ocurrance of this one with this index while an
>>> autovacuum was running:
>>>
>>> """
>>> CREATE INDEX bt_f8_heap_seqno_idx 
>>> ON public.bt_f8_heap 
>>> USING brin (seqno float8_minmax_multi_ops);
>>> """
>>> Attached is a backtrace.
>>
>> Thanks for the report!
>>
>> I think I see the issue - brin_minmax_multi_union does not realize the
>> two summaries could have just one range each, and those can overlap so
>> that merge_overlapping_ranges combines them into a single one.
>>
>> This is harmless, except that the assert int build_distances is overly
>> strict. Not sure if we should just remove the assert or only compute the
>> distances with (neranges>1).
>>
>> Do you happen to have the core dump? It'd be useful to look at ranges_a
>> and ranges_b, to confirm this is indeed what's happening.
>>
> 
> I do have it.
> 
> (gdb) p *ranges_a
> $4 = {
>   typid = 701,
>   colloid = 0,
>   attno = 0,
>   cmp = 0x0,
>   nranges = 0,
>   nsorted = 1,
>   nvalues = 1,
>   maxvalues = 32,
>   target_maxvalues = 32,
>   values = 0x55d2ea1987c8
> }
> (gdb) p *ranges_b
> $5 = {
>   typid = 701,
>   colloid = 0,
>   attno = 0,
>   cmp = 0x0,
>   nranges = 0,
>   nsorted = 1,
>   nvalues = 1,
>   maxvalues = 32,
>   target_maxvalues = 32,
>   values = 0x55d2ea196da8
> }
> 

Thanks. That mostly confirms my theory. I'd bet that this

(gdb) p ranges_a->values[0]
(gdb) p ranges_b->values[0]

will print the same value. I've been able to reproduce this, but it's
pretty difficult to get the timing right (and it requires table with
just a single value in that BRIN range).

I'll get this fixed in a couple days. Considering the benign nature of
this issue (unnecessary assert) I'm not going to rush.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Jaime Casanova
On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
> On 9/29/22 08:53, Jaime Casanova wrote:
> > ...
> > 
> > Just found one more ocurrance of this one with this index while an
> > autovacuum was running:
> > 
> > """
> > CREATE INDEX bt_f8_heap_seqno_idx 
> > ON public.bt_f8_heap 
> > USING brin (seqno float8_minmax_multi_ops);
> > """
> > Attached is a backtrace.
> 
> Thanks for the report!
> 
> I think I see the issue - brin_minmax_multi_union does not realize the
> two summaries could have just one range each, and those can overlap so
> that merge_overlapping_ranges combines them into a single one.
> 
> This is harmless, except that the assert int build_distances is overly
> strict. Not sure if we should just remove the assert or only compute the
> distances with (neranges>1).
> 
> Do you happen to have the core dump? It'd be useful to look at ranges_a
> and ranges_b, to confirm this is indeed what's happening.
> 

I do have it.

(gdb) p *ranges_a
$4 = {
  typid = 701,
  colloid = 0,
  attno = 0,
  cmp = 0x0,
  nranges = 0,
  nsorted = 1,
  nvalues = 1,
  maxvalues = 32,
  target_maxvalues = 32,
  values = 0x55d2ea1987c8
}
(gdb) p *ranges_b
$5 = {
  typid = 701,
  colloid = 0,
  attno = 0,
  cmp = 0x0,
  nranges = 0,
  nsorted = 1,
  nvalues = 1,
  maxvalues = 32,
  target_maxvalues = 32,
  values = 0x55d2ea196da8
}

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: Crash in BRIN minmax-multi indexes

2022-10-03 Thread Tomas Vondra
On 9/29/22 08:53, Jaime Casanova wrote:
> ...
> 
> Just found one more ocurrance of this one with this index while an
> autovacuum was running:
> 
> """
> CREATE INDEX bt_f8_heap_seqno_idx 
> ON public.bt_f8_heap 
> USING brin (seqno float8_minmax_multi_ops);
> """
> Attached is a backtrace.

Thanks for the report!

I think I see the issue - brin_minmax_multi_union does not realize the
two summaries could have just one range each, and those can overlap so
that merge_overlapping_ranges combines them into a single one.

This is harmless, except that the assert int build_distances is overly
strict. Not sure if we should just remove the assert or only compute the
distances with (neranges>1).

Do you happen to have the core dump? It'd be useful to look at ranges_a
and ranges_b, to confirm this is indeed what's happening.

If not, how reproducible is it?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2022-09-29 Thread Jaime Casanova
On Sun, Apr 04, 2021 at 07:52:50PM +0200, Tomas Vondra wrote:
> On 4/4/21 7:25 AM, Jaime Casanova wrote:
> > 
> > pgbench -i postgres
> > psql -c "CREATE INDEX ON pgbench_history USING brin (tid 
> > int4_minmax_multi_ops);" postgres
> > pgbench -c2 -j2 -T 300 -n postgres
> > 
> 
> Fixed and pushed too.
> 
> Turned out to be a silly bug in forgetting to remember the number of
> ranges after deduplication, which sometimes resulted in assert failure.
> It's a bit hard to trigger because concurrency / good timing is needed
> while summarizing the range, requiring a call to "union" function. Just
> running the pgbench did not trigger the issue for me, I had to manually
> call the brin_summarize_new_values().
> 
> For the record, I did a lot of testing with data randomized in various
> ways - the scripts are available here:
> 
> https://github.com/tvondra/brin-randomized-tests
> 
> It was focused on discovering issues in the distance functions, and
> comparing results with/without the index. Maybe the next step should be
> adding some changes to the data, which might trigger more issues like
> this one.
> 

Just found one more ocurrance of this one with this index while an
autovacuum was running:

"""
CREATE INDEX bt_f8_heap_seqno_idx 
ON public.bt_f8_heap 
USING brin (seqno float8_minmax_multi_ops);
"""

Attached is a backtrace.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {4194304, 140737005547296, 2, 6, 6807829, 
94364324073680, 4611686018427388799, 140153051314854, 0, 281470681751456, 0, 0, 
0, 
0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7f77ec9a4535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {0, 0, 0, 0, 0, 140153048813557, 2, 
  3631085705129160672, 7018350057190602853, 94364324073680, 
7003715780713122240, 0, 12346699825684934912, 140737005547536, 
  94364359032568, 140737005548400}}, sa_flags = -402386736, 
sa_restorer = 0x55d2ea197ef8}
sigs = {__val = {32, 0 }}
#2  0x55d2e870514a in ExceptionalCondition (conditionName=0x55d2e87843b0 
"neranges >= 2", errorType=0x55d2e8783f14 "FailedAssertion", 
fileName=0x55d2e8783f00 "brin_minmax_multi.c", lineNumber=1338) at 
assert.c:69
No locals.
#3  0x55d2e804966f in build_distances (distanceFn=0x55d2ea1b2c88, 
colloid=0, eranges=0x55d2ea1bf7c0, neranges=1) at brin_minmax_multi.c:1338
i = 21970
ndistances = -367317728
distances = 0x55d2ea1b2c88
#4  0x55d2e804c595 in brin_minmax_multi_union (fcinfo=0x7fffe338f4c0) at 
brin_minmax_multi.c:2841
bdesc = 0x55d2ea1b2e88
col_a = 0x55d2ea196fb8
col_b = 0x55d2ea1bb7c8
colloid = 0
serialized_a = 0x55d2ea1b1dc0
serialized_b = 0x55d2ea1b1df8
ranges_a = 0x55d2ea198798
ranges_b = 0x55d2ea196d78
attno = 1
attr = 0x7f77e3693e88
eranges = 0x55d2ea1bf7c0
neranges = 1
cmpFn = 0x55d2ea1b2cc0
distanceFn = 0x55d2ea1b2c88
distances = 0x55d2e8785c27
ctx = 0x55d2ea1bf6a0
oldctx = 0x55d2ea196b30
#5  0x55d2e87100e0 in FunctionCall3Coll (flinfo=0x55d2ea1adbb0, 
collation=0, arg1=94364359143048, arg2=94364359028664, arg3=94364359178184)
at fmgr.c:1171
fcinfodata = {fcinfo = {flinfo = 0x55d2ea1adbb0, context = 0x0, 
resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 3, 
args = 0x7fffe338f4e0}, 
  fcinfo_data = "\260\333\032\352\322U", '\000' , 
"\177\003\000\210.\033\352\322U\000\000\000k\031\352\322U\000\000\270o\031\352\322U\000\000\000\266\033\352\v\022\000\000ȷ\033\352\322U\000\000\000\333\032\352\322U\000"}
fcinfo = 0x7fffe338f4c0
result = 16812436272
__func__ = "FunctionCall3Coll"
#6  0x55d2e804451d in union_tuples (bdesc=0x55d2ea1b2e88, a=0x55d2ea196f90, 
b=0x55d2ea1b1cd0) at brin.c:1643
unionFn = 0x55d2ea1adbb0
col_a = 0x55d2ea196fb8
col_b = 0x55d2ea1bb7c8
opcinfo = 0x55d2ea1b2c70
keyno = 0
db = 0x55d2ea1bb7a0
cxt = 0x55d2ea1bb680
oldcxt = 0x55d2ea196b30
#7  0x55d2e8043fb2 in summarize_range (indexInfo=0x55d2ea1989b0, 
state=0x55d2ea198ac8, heapRel=0x7f77e3692330, heapBlk=5760, heapNumBlks=7281)
at brin.c:1452
newtup = 0x55d2ea1b1cd0
newsize = 40
didupdate = false
samepage = false
phbuf = 14589
phtup = 0x55d2ea1b1cd0
phsz = 40
offset = 16
scanNumBlks = 128
__func__ = "summarize_range"
#8  0x55d2e804415b in brinsummarize (index=0x7f77e3693b40, 
heapRel=0x7f77e3692330, pageRange=4294967295, include_partial=false, 
numSummarized=0x55d2ea197ba8, numExisting=0x55d2ea197ba8) at brin.c:1534
tup = 0x0

Re: Crash in BRIN minmax-multi indexes

2021-04-04 Thread Tomas Vondra
BTW, for the inet data type, I considered simply calling the "minus"
function, but that does not work because of this strange behavior:


int4=# select '10.1.1.102/32'::inet > '10.1.1.142/24'::inet;
 ?column?
--
 t
(1 row)

int4=# select '10.1.1.102/32'::inet - '10.1.1.142/24'::inet;
 ?column?
--
  -40
(1 row)


That is, (a>b) but then (a-b) < 0. AFAICS it's due to comparator
considering the mask, while the minus ignores it. I find it a bit
strange, but I assume it's intentional.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2021-04-04 Thread Tomas Vondra
On 4/4/21 7:25 AM, Jaime Casanova wrote:
> ...
> Changing to using month of 30 days on the formula fixed it.
> 

I've pushed fixes for all the bugs reported in this thread so far
(mostly distance calculations, ...), and one bug (swapped operator
parameters in one place) I discovered while working on the fixes.

> and I found another issue, this time involves autovacuum which makes it
> a little more complicated to reproduce.
> 
> Currently the only stable way to reproduce it is using pgbench:
> 
> pgbench -i postgres
> psql -c "CREATE INDEX ON pgbench_history USING brin (tid 
> int4_minmax_multi_ops);" postgres
> pgbench -c2 -j2 -T 300 -n postgres
> 

Fixed and pushed too.

Turned out to be a silly bug in forgetting to remember the number of
ranges after deduplication, which sometimes resulted in assert failure.
It's a bit hard to trigger because concurrency / good timing is needed
while summarizing the range, requiring a call to "union" function. Just
running the pgbench did not trigger the issue for me, I had to manually
call the brin_summarize_new_values().

For the record, I did a lot of testing with data randomized in various
ways - the scripts are available here:

https://github.com/tvondra/brin-randomized-tests

It was focused on discovering issues in the distance functions, and
comparing results with/without the index. Maybe the next step should be
adding some changes to the data, which might trigger more issues like
this one.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2021-04-03 Thread Jaime Casanova
On Thu, Apr 01, 2021 at 03:22:59PM +0200, Tomas Vondra wrote:
> On 4/1/21 3:09 PM, Zhihong Yu wrote:
> > Hi,
> > Can you try this patch ?
> > 
> > Thanks
> > 
> > diff --git a/src/backend/access/brin/brin_minmax_multi.c
> > b/src/backend/access/brin/brin_minmax_multi.c
> > index 70109960e8..25d6d2e274 100644
> > --- a/src/backend/access/brin/brin_minmax_multi.c
> > +++ b/src/backend/access/brin/brin_minmax_multi.c
> > @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
> >      delta = 24L * 3600L * delta;
> > 
> >      /* and add the time part */
> > -    delta += result->time / (float8) 100.0;
> > +    delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> > 100.0;
> > 
> 
> That won't work, because Interval does not have a "zone" field, so this
> won't even compile.
> 
> The problem is that interval comparisons convert the value using 30 days
> per month (see interval_cmp_value), but the formula in this function
> uses 31. So either we can tweak that (seems to fix it for me), or maybe
> just switch to interval_cmp_value directly.
> 

Changing to using month of 30 days on the formula fixed it.

and I found another issue, this time involves autovacuum which makes it
a little more complicated to reproduce.

Currently the only stable way to reproduce it is using pgbench:

pgbench -i postgres
psql -c "CREATE INDEX ON pgbench_history USING brin (tid 
int4_minmax_multi_ops);" postgres
pgbench -c2 -j2 -T 300 -n postgres

Attached a backtrace

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {4194304, 140730649395040, 2, 6, 6603273, 
9461368344, 
4611686018427388799, 139855637088934, 0, 281470681751456, 0, 0, 0, 
0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7f32ad593535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {
  0, 0, 0, 0, 0, 139855634845685, 2, 7147272211649118304, 
7003431887686743910, 
  9461368344, 7003770563910163344, 0, 17199369337215936768, 
140730649395280, 0, 
  140730649396144}}, sa_flags = -144564240, sa_restorer = 0x0}
sigs = {__val = {32, 0 }}
#2  0x560cf7cbc5d0 in ExceptionalCondition (conditionName=0x560cf7d370c9 
"DatumGetBool(r)", 
errorType=0x560cf7d36f74 "FailedAssertion", fileName=0x560cf7d36f60 
"brin_minmax_multi.c", 
lineNumber=455) at assert.c:69
No locals.
#3  0x560cf7628767 in AssertCheckExpandedRanges (bdesc=0x560cf9914e08, 
colloid=0, attno=1, 
attr=0x7f32a442d858, ranges=0x560cf991bbf0, nranges=10) at 
brin_minmax_multi.c:455
r = 0
minval = 10
maxval = 10
i = 9
eq = 0x560cf9914ca0
lt = 0x560cf9914c40
#4  0x560cf762d17d in brin_minmax_multi_union (fcinfo=0x7ffe685dc520)
at brin_minmax_multi.c:2779
bdesc = 0x560cf9914e08
col_a = 0x560cf9923f88
col_b = 0x560cf9939898
colloid = 0
serialized_a = 0x560cf993d020
serialized_b = 0x560cf993d0b0
ranges_a = 0x560cf9923d48
ranges_b = 0x560cf9913be8
attno = 1
attr = 0x7f32a442d858
eranges = 0x560cf991bbf0
neranges = 10
cmpFn = 0x560cf9914c40
distanceFn = 0x560cf991aee0
distances = 0x560cf7d38d27
ctx = 0x560cf991bad0
oldctx = 0x560cf9923b00
#5  0x560cf7cc6ae2 in FunctionCall3Coll (flinfo=0x560cf991afb0, 
collation=0, 
arg1=94613726645768, arg2=94613726707592, arg3=94613726795928) at 
fmgr.c:1188
fcinfodata = {fcinfo = {flinfo = 0x560cf991afb0, context = 0x0, 
resultinfo = 0x0, 
fncollation = 0, isnull = false, nargs = 3, args = 0x7ffe685dc540}, 
  fcinfo_data = "\260\257\221\371\fV", '\000' , 
"\177\003\000\bN\221\371\fV\000\000\000;\222\371\fV\000\000\210?\222\371\fV\000\000\000\227\223\371\v\022\000\000\230\230\223\371\fV\000\000\000\257\221\371\fV\000"}
fcinfo = 0x7ffe685dc520
result = 17072012032
__func__ = "FunctionCall3Coll"
#6  0x560cf762541d in union_tuples (bdesc=0x560cf9914e08, a=0x560cf9923f60, 
b=0x560cf993d078)
at brin.c:1618
unionFn = 0x560cf991afb0
col_a = 0x560cf9923f88
col_b = 0x560cf9939898
opcinfo = 0x560cf9914bf0
keyno = 0
db = 0x560cf9939870
cxt = 0x560cf9939750
oldcxt = 0x560cf9923b00
#7  0x560cf7624eb2 in summarize_range (indexInfo=0x560cf99243a8, 
state=0x560cf9924e08, 
heapRel=0x7f32a442dfb8, heapBlk=460, heapNumBlks=463) at brin.c:1427
newtup = 0x560cf9924110
newsize = 72
didupdate = false
samepage = true
phbuf = 2283
phtup = 0x560cf993d078
phsz = 32
offset = 17
scanNumBlks = 2
__func__ = "summarize_range"
#8  

Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Zhihong Yu
Hi, Tomas:
Thanks for the correction.

I think switching to interval_cmp_value() would be better (with a comment
explaining why).

Cheers

On Thu, Apr 1, 2021 at 6:23 AM Tomas Vondra 
wrote:

> On 4/1/21 3:09 PM, Zhihong Yu wrote:
> > Hi,
> > Can you try this patch ?
> >
> > Thanks
> >
> > diff --git a/src/backend/access/brin/brin_minmax_multi.c
> > b/src/backend/access/brin/brin_minmax_multi.c
> > index 70109960e8..25d6d2e274 100644
> > --- a/src/backend/access/brin/brin_minmax_multi.c
> > +++ b/src/backend/access/brin/brin_minmax_multi.c
> > @@ -2161,7 +2161,7 @@
> brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
> >  delta = 24L * 3600L * delta;
> >
> >  /* and add the time part */
> > -delta += result->time / (float8) 100.0;
> > +delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> > 100.0;
> >
>
> That won't work, because Interval does not have a "zone" field, so this
> won't even compile.
>
> The problem is that interval comparisons convert the value using 30 days
> per month (see interval_cmp_value), but the formula in this function
> uses 31. So either we can tweak that (seems to fix it for me), or maybe
> just switch to interval_cmp_value directly.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Tomas Vondra
On 4/1/21 3:09 PM, Zhihong Yu wrote:
> Hi,
> Can you try this patch ?
> 
> Thanks
> 
> diff --git a/src/backend/access/brin/brin_minmax_multi.c
> b/src/backend/access/brin/brin_minmax_multi.c
> index 70109960e8..25d6d2e274 100644
> --- a/src/backend/access/brin/brin_minmax_multi.c
> +++ b/src/backend/access/brin/brin_minmax_multi.c
> @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
>      delta = 24L * 3600L * delta;
> 
>      /* and add the time part */
> -    delta += result->time / (float8) 100.0;
> +    delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> 100.0;
> 

That won't work, because Interval does not have a "zone" field, so this
won't even compile.

The problem is that interval comparisons convert the value using 30 days
per month (see interval_cmp_value), but the formula in this function
uses 31. So either we can tweak that (seems to fix it for me), or maybe
just switch to interval_cmp_value directly.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Zhihong Yu
Hi,
Can you try this patch ?

Thanks

diff --git a/src/backend/access/brin/brin_minmax_multi.c
b/src/backend/access/brin/brin_minmax_multi.c
index 70109960e8..25d6d2e274 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
 delta = 24L * 3600L * delta;

 /* and add the time part */
-delta += result->time / (float8) 100.0;
+delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
100.0;

 Assert(delta >= 0);

On Wed, Mar 31, 2021 at 11:25 PM Jaime Casanova <
jcasa...@systemguards.com.ec> wrote:

> On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova
>  wrote:
> >
> > On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
> >  wrote:
> > >
> > > Hi,
> > >
> > > I think I found the issue - it's kinda obvious, really. We need to
> > > consider the timezone, because the "time" parts alone may be sorted
> > > differently. The attached patch should fix this, and it also fixes a
> > > similar issue in the inet data type.
> > >
> >
> > ah! yeah! obvious... if you say so ;)
> >
> > > As for why the regression tests did not catch this, it's most likely
> > > because the data is likely generated in "nice" ordering, or something
> > > like that. I'll see if I can tweak the ordering to trigger these issues
> > > reliably, and I'll do a bit more randomized testing.
> > >
> > > There's also the question of rounding errors, which I think might cause
> > > random assert failures (but in practice it's harmless, in the worst
> case
> > > we'll merge the ranges a bit differently).
> > >
> > >
> >
> > I can confirm this fixes the crash in the query I showed and the
> original case.
> >
>
> But I found another, but similar issue.
>
> ```
> update public.brintest_multi set
>   intervalcol = (select pg_catalog.avg(intervalcol) from
> public.brintest_bloom)
> ;
> ```
>
> BTW, i can reproduce just by executing "make installcheck" and
> immediately execute that query
>
> --
> Jaime Casanova
> Director de Servicios Profesionales
> SYSTEMGUARDS - Consultores de PostgreSQL
>


Re: Crash in BRIN minmax-multi indexes

2021-04-01 Thread Jaime Casanova
On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova
 wrote:
>
> On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
>  wrote:
> >
> > Hi,
> >
> > I think I found the issue - it's kinda obvious, really. We need to
> > consider the timezone, because the "time" parts alone may be sorted
> > differently. The attached patch should fix this, and it also fixes a
> > similar issue in the inet data type.
> >
>
> ah! yeah! obvious... if you say so ;)
>
> > As for why the regression tests did not catch this, it's most likely
> > because the data is likely generated in "nice" ordering, or something
> > like that. I'll see if I can tweak the ordering to trigger these issues
> > reliably, and I'll do a bit more randomized testing.
> >
> > There's also the question of rounding errors, which I think might cause
> > random assert failures (but in practice it's harmless, in the worst case
> > we'll merge the ranges a bit differently).
> >
> >
>
> I can confirm this fixes the crash in the query I showed and the original 
> case.
>

But I found another, but similar issue.

```
update public.brintest_multi set
  intervalcol = (select pg_catalog.avg(intervalcol) from public.brintest_bloom)
;
```

BTW, i can reproduce just by executing "make installcheck" and
immediately execute that query

--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {4194304, 140732635565408, 2, 6, 6574673, 
94040556822512, 4611686018427388799, 140208685218470, 0, 281470681751456, 0, 0, 
0, 0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7f84e0a82535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {0, 0, 0, 0, 0, 140208682975221, 2, 3978988772531372052, 
7003432107538527544, 94040556822512, 
  7003767243900412736, 0, 11642884921704132352, 140732635565648, 0, 
140732635566512}}, sa_flags = -2047090704, sa_restorer = 0x0}
sigs = {__val = {32, 0 }}
#2  0x558786650fb6 in ExceptionalCondition (conditionName=0x5587866cb6c3 
"delta >= 0", errorType=0x5587866caf74 "FailedAssertion", 
fileName=0x5587866caf60 "brin_minmax_multi.c", lineNumber=2166)
at assert.c:69
No locals.
#3  0x558785fc7cd4 in brin_minmax_multi_distance_interval 
(fcinfo=0x7ffedec05220) at brin_minmax_multi.c:2166
delta = -82532.55908594
ia = 0x558787f13738
ib = 0x558787efdc20
result = 0x558787f207e0
__func__ = "brin_minmax_multi_distance_interval"
#4  0x55878665b3ae in FunctionCall2Coll (flinfo=0x558787efbbf0, 
collation=0, arg1=94040589678392, arg2=94040589589536) at fmgr.c:1163
fcinfodata = {fcinfo = {flinfo = 0x558787efbbf0, context = 0x0, 
resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 2, args = 
0x7ffedec05240}, 
  fcinfo_data = "\360\273U", '\000' , 
"\002\000\070\067\361\207\207U\000\000\000\000\000\000\000\000\000\000 
\334U\000\000\000\a\362\207\207U\000"}
fcinfo = 0x7ffedec05220
result = 94040589731280
__func__ = "FunctionCall2Coll"
#5  0x558785fc65b7 in build_distances (distanceFn=0x558787efbbf0, 
colloid=0, eranges=0x558787f206f0, neranges=5) at brin_minmax_multi.c:1350
a1 = 94040589678392
a2 = 94040589589536
r = 94040556857001
i = 0
ndistances = 4
distances = 0x558787f20788
#6  0x558785fc712f in compactify_ranges (bdesc=0x558787efb428, 
ranges=0x558787f12720, max_values=32) at brin_minmax_multi.c:1820
cmpFn = 0x558787efbc28
distanceFn = 0x558787efbbf0
eranges = 0x558787f206f0
neranges = 5
distances = 0x10055870008
ctx = 0x558787f205d0
oldctx = 0x558787efd560
#7  0x558785fc8483 in brin_minmax_multi_serialize (bdesc=0x558787efb428, 
src=94040589674272, dst=0x558787efda10) at brin_minmax_multi.c:2354
ranges = 0x558787f12720
s = 0x558787efe7a8
#8  0x558785fce67b in brin_form_tuple (brdesc=0x558787efb428, blkno=3, 
tuple=0x558787efd680, size=0x7ffedec054b8) at brin_tuple.c:165
datumno = 1
values = 0x558787efe0f8
nulls = 0x558787efdc48
anynulls = false
rettuple = 0x7d7924400
keyno = 14
idxattno = 14
phony_infomask = 0
phony_nullbitmap = 0x558787efe210 "\177\177\177~\177\177\177\177"
len = 94040556891582
hoff = 1024
data_len = 94040589589704
i = 32766
untoasted_values = 0x558787efe230
nuntoasted = 0
#9  0x558785fbe9c1 in brininsert (idxRel=0x7f84d79244b0, 
values=0x7ffedec05610, nulls=0x7ffedec055f0, heaptid=0x558787eeaf80, 
heapRel=0x7f84d79110e0, checkUnique=UNIQUE_CHECK_NO, indexUnchanged=false, 
indexInfo=0x558787eeaa38) at brin.c:281
lp = 0x7f84db92dfa4
origsz = 992
newsz = 94040559734857
page = 0x7f84db92df80 

Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Thomas Munro
On Thu, Apr 1, 2021 at 11:25 AM Tomas Vondra
 wrote:
> As for why the regression tests did not catch this, it's most likely
> because the data is likely generated in "nice" ordering, or something
> like that. I'll see if I can tweak the ordering to trigger these issues
> reliably, and I'll do a bit more randomized testing.

For what little it's worth now that you've cracked it, I can report
that make check blows up somewhere in here on a 32 bit system with
--with-blocksize=32 :-)




Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Tomas Vondra
On 4/1/21 3:22 AM, Zhihong Yu wrote:
> Hi,
> -       delta += (float8) addrb[i] - (float8) addra[i];
> -       delta /= 256;
> ...
> +       delta /= 255;
> 
> May I know why the divisor was changed ?
> 

Yeah, that's a mistake, it should remain 256. Consider two subtractions

1.1.2.255 - 1.1.1.0 = [0, 0, 1, 255]

1.1.2.255 - 1.1.0.255 = [0, 0, 2, 0]

With the divisor being 255 those would be the same (2 * 256), but we
want the first one to be a bit smaller. It's also consistent with how
inet does subtractions:

test=# select '1.1.2.255'::inet - '1.1.0.255'::inet;
 ?column?
--
  512
(1 row)

test=# select '1.1.2.255'::inet - '1.1.1.0'::inet;
 ?column?
--
  511
(1 row)

So I'll keep the 256.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Zhihong Yu
Hi,
-   delta += (float8) addrb[i] - (float8) addra[i];
-   delta /= 256;
...
+   delta /= 255;

May I know why the divisor was changed ?

Thanks

On Wed, Mar 31, 2021 at 3:25 PM Tomas Vondra 
wrote:

> Hi,
>
> I think I found the issue - it's kinda obvious, really. We need to
> consider the timezone, because the "time" parts alone may be sorted
> differently. The attached patch should fix this, and it also fixes a
> similar issue in the inet data type.
>
> As for why the regression tests did not catch this, it's most likely
> because the data is likely generated in "nice" ordering, or something
> like that. I'll see if I can tweak the ordering to trigger these issues
> reliably, and I'll do a bit more randomized testing.
>
> There's also the question of rounding errors, which I think might cause
> random assert failures (but in practice it's harmless, in the worst case
> we'll merge the ranges a bit differently).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Jaime Casanova
On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
 wrote:
>
> Hi,
>
> I think I found the issue - it's kinda obvious, really. We need to
> consider the timezone, because the "time" parts alone may be sorted
> differently. The attached patch should fix this, and it also fixes a
> similar issue in the inet data type.
>

ah! yeah! obvious... if you say so ;)

> As for why the regression tests did not catch this, it's most likely
> because the data is likely generated in "nice" ordering, or something
> like that. I'll see if I can tweak the ordering to trigger these issues
> reliably, and I'll do a bit more randomized testing.
>
> There's also the question of rounding errors, which I think might cause
> random assert failures (but in practice it's harmless, in the worst case
> we'll merge the ranges a bit differently).
>
>

I can confirm this fixes the crash in the query I showed and the original case.

-- 
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL




Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Tomas Vondra
On 4/1/21 12:53 AM, Zhihong Yu wrote:
> Hi,
> For inet data type fix:
> 
> +       unsigned char a = addra[i];
> +       unsigned char b = addrb[i];
> +
> +       if (i >= lena)
> +           a = 0;
> +
> +       if (i >= lenb)
> +           b = 0;
> 
> Should the length check precede the addra[i] ?
> Something like:
> 
>        unsigned char a;
>        if (i >= lena) a = 0;
>        else a = addra[i];
> 

I don't think that makes any difference. We know the bytes are there, we
just want to ignore / reset them in some cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Zhihong Yu
Hi,
For inet data type fix:

+   unsigned char a = addra[i];
+   unsigned char b = addrb[i];
+
+   if (i >= lena)
+   a = 0;
+
+   if (i >= lenb)
+   b = 0;

Should the length check precede the addra[i] ?
Something like:

   unsigned char a;
   if (i >= lena) a = 0;
   else a = addra[i];

Cheers

On Wed, Mar 31, 2021 at 3:25 PM Tomas Vondra 
wrote:

> Hi,
>
> I think I found the issue - it's kinda obvious, really. We need to
> consider the timezone, because the "time" parts alone may be sorted
> differently. The attached patch should fix this, and it also fixes a
> similar issue in the inet data type.
>
> As for why the regression tests did not catch this, it's most likely
> because the data is likely generated in "nice" ordering, or something
> like that. I'll see if I can tweak the ordering to trigger these issues
> reliably, and I'll do a bit more randomized testing.
>
> There's also the question of rounding errors, which I think might cause
> random assert failures (but in practice it's harmless, in the worst case
> we'll merge the ranges a bit differently).
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Tomas Vondra
Hi,

I think I found the issue - it's kinda obvious, really. We need to
consider the timezone, because the "time" parts alone may be sorted
differently. The attached patch should fix this, and it also fixes a
similar issue in the inet data type.

As for why the regression tests did not catch this, it's most likely
because the data is likely generated in "nice" ordering, or something
like that. I'll see if I can tweak the ordering to trigger these issues
reliably, and I'll do a bit more randomized testing.

There's also the question of rounding errors, which I think might cause
random assert failures (but in practice it's harmless, in the worst case
we'll merge the ranges a bit differently).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 70109960e8..439e0b414c 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2090,7 +2090,7 @@ brin_minmax_multi_distance_timetz(PG_FUNCTION_ARGS)
 	TimeTzADT  *ta = PG_GETARG_TIMETZADT_P(0);
 	TimeTzADT  *tb = PG_GETARG_TIMETZADT_P(1);
 
-	delta = tb->time - ta->time;
+	delta = (tb->time + tb->zone * USECS_PER_SEC) - (ta->time + ta->zone * USECS_PER_SEC);
 
 	Assert(delta >= 0);
 
@@ -2292,6 +2292,9 @@ brin_minmax_multi_distance_inet(PG_FUNCTION_ARGS)
 	inet	   *ipa = PG_GETARG_INET_PP(0);
 	inet	   *ipb = PG_GETARG_INET_PP(1);
 
+	int			lena,
+lenb;
+
 	/*
 	 * If the addresses are from different families, consider them to be in
 	 * maximal possible distance (which is 1.0).
@@ -2299,20 +2302,38 @@ brin_minmax_multi_distance_inet(PG_FUNCTION_ARGS)
 	if (ip_family(ipa) != ip_family(ipb))
 		PG_RETURN_FLOAT8(1.0);
 
-	/* ipv4 or ipv6 */
-	if (ip_family(ipa) == PGSQL_AF_INET)
-		len = 4;
-	else
-		len = 16;/* NS_IN6ADDRSZ */
-
 	addra = ip_addr(ipa);
 	addrb = ip_addr(ipb);
 
+	/*
+	 * The length is calculated from the mask length, because we sort the
+	 * addresses by first address in the range, so A.B.C.D/24 < A.B.C.1
+	 * (the first range starts at A.B.C.0, which is before A.B.C.1). We
+	 * don't want to produce negative delta in this case, so we just cut
+	 * the extra bytes.
+	 *
+	 * XXX Maybe this should be a bit more careful and cut the bits, not
+	 * just whole bytes.
+	 */
+	lena = ip_bits(ipa) / 8;
+	lenb = ip_bits(ipb) / 8;
+
+	len = Max(lena, lenb);
+
 	delta = 0;
 	for (i = len - 1; i >= 0; i--)
 	{
-		delta += (float8) addrb[i] - (float8) addra[i];
-		delta /= 256;
+		unsigned char a = addra[i];
+		unsigned char b = addrb[i];
+
+		if (i >= lena)
+			a = 0;
+
+		if (i >= lenb)
+			b = 0;
+
+		delta += (float8) b - (float8) a;
+		delta /= 255;
 	}
 
 	Assert((delta >= 0) && (delta <= 1));


Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Tomas Vondra
On 3/31/21 8:20 PM, Zhihong Yu wrote:
> Hi,
> In build_distances():
> 
>         a1 = eranges[i].maxval;
>         a2 = eranges[i + 1].minval;
> 
> It seems there was overlap between the successive ranges, leading to
> delta = -678500
> 

I've been unable to reproduce this, so far :-( How exactly did you
manage to reproduce it?


The thing is - how could there be an overlap? The way we build expanded
ranges that should not be possible, I think. Can you print the ranges at
the end of fill_expanded_ranges? That should shed some light on this.


FWIW I suspect those asserts on delta may be a bit problematic due to
rounding errors. And I found one issue in the inet distance function,
because apparently

test=# select '10.2.14.243/24'::inet < '10.2.14.231/24'::inet;
 ?column?
--
 f
(1 row)

but the delta formula currently ignores the mask. But that's a separate
issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Crash in BRIN minmax-multi indexes

2021-03-31 Thread Zhihong Yu
Hi,
In build_distances():

a1 = eranges[i].maxval;
a2 = eranges[i + 1].minval;

It seems there was overlap between the successive ranges, leading to
delta = -678500

FYI

On Wed, Mar 31, 2021 at 10:30 AM Jaime Casanova <
jcasa...@systemguards.com.ec> wrote:

> Hi,
>
> Just found $SUBJECT involving time with time zone and a subselect. I
> still don't have narrowed to the exact table/index minimal schema but
> if you run this query on the regression database it will creash.
>
> ```
> update public.brintest_multi set
>   timetzcol = (select tz from generate_series('2021-01-01'::timestamp
> with time zone, '2021-01-31', '5 days') tz limit 1)
> ;
> ```
>
> attached a backtrace. Let me know if you need extra information.
>
> --
> Jaime Casanova
> Director de Servicios Profesionales
> SYSTEMGUARDS - Consultores de PostgreSQL
>