"Naik, Sameer" <sameer_n...@bmc.com> writes:
> On Mon, May 20, 2019 at 09:37:34PM +0000, Deepak Somaiya wrote:
>> wow this is interesting!
>> @Tom, Bruce, David - Experts
>> Any idea why would changing the datatype would cause so much degradation - 
>> this is even when plan remains the same ,data is same.

I see nothing very exciting here.  text equality comparison reduces to
a memcmp, while citext equality comparison is quite expensive, since
it has to case-fold both inputs before it can memcmp them.

For the given test case:

> '        ->  Index Scan using i776_0_400129200_t776 on t776  
> (cost=0.42..12.66 rows=1 width=52) (actual time=1187.686..5531.421 rows=48 
> loops=1)'
> '              Index Cond: ((c400129200 = $1) AND (c400127400 = $2))'
> '              Filter: (((c400129100 <> $3) OR (c400129100 IS NULL)) AND 
> ((c179 = $4) OR (c179 = $5) OR (c179 = $6) OR (c179 = $7) OR (c179 = $8) OR 
> (c179 = $9) OR (c179 = $10) OR (c179 = $11) OR (c179 = $12) OR (c179 = $13) 
> OR (c179 = $14) OR (c179 = $15) OR (c179 = $16) OR (c179 = $17) OR (c179 = 
> $18) OR (c179 = $19) OR (c179 = $20) OR (c179 = $21) OR (c179 = $22) OR (c179 
> = $23) OR (c179 = $24) OR (c179 = $25) OR (c179 = $26) OR (c179 = $27) OR 
> (c179 = $28) OR (c179 = $29) OR (c179 = $30) OR (c179 = $31) OR (c179 = $32) 
> OR (c179 = $33) OR (c179 = $34) OR (c179 = $35) OR (c179 = $36) OR (c179 = 
> $37) OR (c179 = $38) OR (c179 = $39) OR (c179 = $40) OR (c179 = $41) OR (c179 
> = $42) OR (c179 = $43) OR (c179 = $44) OR (c179 = $45) OR (c179 = $46) OR 
> (c179 = $47) OR (c179 = $48) OR (c179 = $49) OR (c179 = $50) OR (c179 = 
> $51)))'
> '              Rows Removed by Filter: 55322'

it's reasonable to suppose that not many of the rows are failing the
c400129100 conditions, so that in order to decide that a row doesn't
pass the filter, we are forced to perform each of the OR'd c179
comparisons.  So this query did something like 48 * 55322 equality
comparisons for c179.  If the cost of a citexteq evaluation is
around 2 microseconds, that'd fully explain the runtime differential.

The OP didn't say what locale or encoding he's using.  Maybe switching
to some other settings would improve matters ... though if non-ASCII
case folding is a business requirement, that likely won't go far.

Or you could get rid of the need for the repetitive case-folding,
say by storing lower(c179) in a separate column and doing plain
text comparisons to pre-lowercased input values.

                        regards, tom lane


Reply via email to