Re: [HACKERS] pg_trgm Memory Allocation logic

2015-03-10 Thread Beena Emerson
Hello,

 If you manually set RPADDING 2 in trgm.h, then it will, but the 
 allocation probably should use LPADDING/RPADDING to get it right, rather 
 than assume the max values.

Yes you are right. For RPADDING = 2, the current formula is suitable but for
RPADDING =1, a lot of extra space is allocated.

IIUC, for each word the total number of trigrams is: (word_length + LPADDING
+ RPADDING - 2).
Also the maximum number of words a string can have is: (slen +1)/2
(considering each word has length of 1)

I think (slen + (slen + 1)/2 * (LPADDING + RPADDING - 3) + 1) allocates only
the necessary memory for different values of RPADDING.

Regards,

Beena Emerson







-

--

Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/pg-trgm-Memory-Allocation-logic-tp5841088p5841238.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm Memory Allocation logic

2015-03-09 Thread Alvaro Herrera
Beena Emerson wrote:
 In the pg_trgm module, within function generate_trgm, the memory for trigrams
 is allocated as follows:
 
 trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);
 
 I have been trying to understand why this is so because it seems to be
 allocating more space than that is required.

Maybe it's considering a worst-case for multibyte characteres?  I don't
really know if trgm supports multibyte, but I assume it does.  If it
does, then probably the trigrams consist of chars, not bytes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm Memory Allocation logic

2015-03-09 Thread Heikki Linnakangas

On 03/09/2015 02:54 PM, Alvaro Herrera wrote:

Beena Emerson wrote:

In the pg_trgm module, within function generate_trgm, the memory for trigrams
is allocated as follows:

trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);

I have been trying to understand why this is so because it seems to be
allocating more space than that is required.


Maybe it's considering a worst-case for multibyte characteres?  I don't
really know if trgm supports multibyte, but I assume it does.  If it
does, then probably the trigrams consist of chars, not bytes.


Nope. Trigrams are always three bytes, even ones containing multibyte 
characters. If there are any multibyte characters in the trigram, we 
store a 3-byte checksum of the three characters instead. That loses some 
information, you can have a collision where one multibyte trigram 
incorrectly matches another one, but the trigram algorithms are 
generally not too concerned about exact results.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm Memory Allocation logic

2015-03-09 Thread Tom Lane
Beena Emerson memissemer...@gmail.com writes:
 In the pg_trgm module, within function generate_trgm, the memory for trigrams
 is allocated as follows:

 trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);

 I have been trying to understand why this is so because it seems to be
 allocating more space than that is required.

Consider input like 'X X X X X'.  Each X produces 3 trigrams.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_trgm Memory Allocation logic

2015-03-09 Thread Beena Emerson
In the pg_trgm module, within function generate_trgm, the memory for trigrams
is allocated as follows:

trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);

I have been trying to understand why this is so because it seems to be
allocating more space than that is required.

The following table shows the palloced size [(slen / 2 + 1) *3] and the
actual trgm count for different string length.

slenpalloc size actual trgm count
2   6   3
26  42  27
38  60  39

Can somebody please explain this to me.


I had tried changing the allocation to slen + 1 and it seemed to be working
without any problem.

trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen + 1)); 


Maybe I am missisng some scenarios. 

Any help would be appreciated.

Thank you,

Beena Emerson



-

--

Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/pg-trgm-Memory-Allocation-logic-tp5841088.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm Memory Allocation logic

2015-03-09 Thread Heikki Linnakangas

On 03/09/2015 03:33 PM, Tom Lane wrote:

Beena Emerson memissemer...@gmail.com writes:

In the pg_trgm module, within function generate_trgm, the memory for trigrams
is allocated as follows:



trg = (TRGM *) palloc(TRGMHDRSIZE + sizeof(trgm) * (slen / 2 + 1) *3);



I have been trying to understand why this is so because it seems to be
allocating more space than that is required.


Consider input like 'X X X X X'.  Each X produces 3 trigrams.


No it won't. Only two:

postgres=# select show_trgm('a b c');
   show_trgm
---
 {  a,  b,  c, a , b , c }
(1 row)

If you manually set RPADDING 2 in trgm.h, then it will, but the 
allocation probably should use LPADDING/RPADDING to get it right, rather 
than assume the max values.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers