[HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Pavel Stehule
Hello

There are some wrong in our implementation NISortDictionary. After
initialisation is ts_cache memory context 55MB long and pg takes
190MB.

dispell_init
cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used
After dictionary loading
cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (12 chunks); 19904424 used
After AffFile loading
cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
After stop words loading
cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
After dictionary sort
cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
After Affixes sort
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (34 chunks); 19904424 used
final
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (34 chunks); 19904424 used

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Pavel Stehule
2010/3/11 Pavel Stehule pavel.steh...@gmail.com:
 Hello

 There are some wrong in our implementation NISortDictionary. After
 initialisation is ts_cache memory context 55MB long and pg takes
 190MB.

 dispell_init
 cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used
 After dictionary loading
 cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
 free (12 chunks); 19904424 used
 After AffFile loading
 cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
 free (20 chunks); 19904424 used
 After stop words loading
 cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
 free (20 chunks); 19904424 used
 After dictionary sort
 cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 
 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
 free (20 chunks); 19904424 used
 After Affixes sort
 cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 
 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
 free (34 chunks); 19904424 used
 final
 cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 
 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
 free (34 chunks); 19904424 used


the mkSPNode takes 45MB

Conf-Dictionary = mkSPNode(Conf, 0, Conf-nspell, 0);

 Regards
 Pavel Stehule


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 There are some wrong in our implementation NISortDictionary. After
 initialisation is ts_cache memory context 55MB long and pg takes
 190MB.

What's your tsearch configuration exactly?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Pavel Stehule
2010/3/11 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 There are some wrong in our implementation NISortDictionary. After
 initialisation is ts_cache memory context 55MB long and pg takes
 190MB.

 What's your tsearch configuration exactly?


files: http://www.pgsql.cz/data/czech.tar.gz

configuration:

CREATE TEXT SEARCH DICTIONARY cspell
   (template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
ALTER TEXT SEARCH CONFIGURATION cs
   ALTER MAPPING FOR word, asciiword WITH cspell, simple;

then try: select * from ts_debug('cs','Příliš žluťoučký kůň se napil
žluté vody');

with some time (used fce clock())

cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used
After dictionary loading 32
cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (12 chunks); 19904424 used
After AffFile loading 33
cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
After stop words loading 33
cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
** 1 **
cspell: 816952 total in 78 blocks; 9240 free (12 chunks); 807712 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
** 2 ** 38
cspell: 825144 total in 79 blocks; 8440 free (10 chunks); 816704 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
** 2.5 ** 49
// mkSPNode
cspell: 825144 total in 79 blocks; 8440 free (10 chunks); 816704 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
** 3 ** 58
cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
After dictionary sort 58
cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (20 chunks); 19904424 used
After Affixes sort 58
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (34 chunks); 19904424 used
final 58
cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (34 chunks); 19904424 used
executor start



                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Pavel Stehule
2010/3/11 Pavel Stehule pavel.steh...@gmail.com:
 2010/3/11 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 There are some wrong in our implementation NISortDictionary. After
 initialisation is ts_cache memory context 55MB long and pg takes
 190MB.

 What's your tsearch configuration exactly?


I have a 64bit Linux.

The problem is in very large small allocations - there are 853215 nodes.

The memory can be minimalized with some block allocations

static void.
binit(void)
{
--data = NULL;
--allocated = 0;
}


static char *
balloc(size_t size)
{
--char *result;
--
--if (data == NULL || size  allocated )
--{
data = palloc(1024 * 100);
allocated = 1024 * 100;
--}
--
--result = data;
--data += size;
--allocated -= size;
--memset(result, 0, size);
--
--return result;
}

I replaced palloc0 inside mkSPnode by balloc

cspell: 25626352 total in 349 blocks; 11048 free (2 chunks); 25615304 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (34 chunks); 19904424 used

versus

cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used
  Ispell dictionary init context: 27615288 total in 13 blocks; 7710864
free (34 chunks); 19904424 used

Regards
Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 The problem is in very large small allocations - there are 853215 nodes.
 I replaced palloc0 inside mkSPnode by balloc

This goes back to the idea we've discussed from time to time of having a
variant memory context type in which pfree() is a no-op and we dispense
with all the per-chunk overhead.  I guess that if there really isn't any
overhead there then pfree/repalloc would actually crash :-( but for the
particular case of dictionaries that would probably be OK because
there's so little code that touches them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Pavel Stehule
2010/3/11 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 The problem is in very large small allocations - there are 853215 nodes.
 I replaced palloc0 inside mkSPnode by balloc

 This goes back to the idea we've discussed from time to time of having a
 variant memory context type in which pfree() is a no-op and we dispense
 with all the per-chunk overhead.  I guess that if there really isn't any
 overhead there then pfree/repalloc would actually crash :-( but for the
 particular case of dictionaries that would probably be OK because
 there's so little code that touches them.

it has a sense. I was surprised how much memory is necessary :(. Some
smarter allocation save 50% - 2.5G for 100 users, what is important,
but I thing, so these data has to be shared. I believed to preloading,
but it is problematic - there are no data in shared preload time, and
the allocated size is too big.

Pavel



                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Alvaro Herrera
Pavel Stehule escribió:
 2010/3/11 Tom Lane t...@sss.pgh.pa.us:
  Pavel Stehule pavel.steh...@gmail.com writes:
  The problem is in very large small allocations - there are 853215 nodes.
  I replaced palloc0 inside mkSPnode by balloc
 
  This goes back to the idea we've discussed from time to time of having a
  variant memory context type in which pfree() is a no-op and we dispense
  with all the per-chunk overhead.  I guess that if there really isn't any
  overhead there then pfree/repalloc would actually crash :-( but for the
  particular case of dictionaries that would probably be OK because
  there's so little code that touches them.
 
 it has a sense. I was surprised how much memory is necessary :(. Some
 smarter allocation save 50% - 2.5G for 100 users, what is important,
 but I thing, so these data has to be shared. I believed to preloading,
 but it is problematic - there are no data in shared preload time, and
 the allocated size is too big.

Could it be mmapped and shared that way?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsearch profiling - czech environment - take 55MB

2010-03-11 Thread Pavel Stehule
2010/3/11 Alvaro Herrera alvhe...@commandprompt.com:
 Pavel Stehule escribió:
 2010/3/11 Tom Lane t...@sss.pgh.pa.us:
  Pavel Stehule pavel.steh...@gmail.com writes:
  The problem is in very large small allocations - there are 853215 nodes.
  I replaced palloc0 inside mkSPnode by balloc
 
  This goes back to the idea we've discussed from time to time of having a
  variant memory context type in which pfree() is a no-op and we dispense
  with all the per-chunk overhead.  I guess that if there really isn't any
  overhead there then pfree/repalloc would actually crash :-( but for the
  particular case of dictionaries that would probably be OK because
  there's so little code that touches them.

 it has a sense. I was surprised how much memory is necessary :(. Some
 smarter allocation save 50% - 2.5G for 100 users, what is important,
 but I thing, so these data has to be shared. I believed to preloading,
 but it is problematic - there are no data in shared preload time, and
 the allocated size is too big.

 Could it be mmapped and shared that way?

I don't know - I newer worked with mmap.

Pavel


 --
 Alvaro Herrera                                http://www.CommandPrompt.com/
 The PostgreSQL Company - Command Prompt, Inc.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers