[HACKERS] tsearch profiling - czech environment - take 55MB
Hello There are some wrong in our implementation NISortDictionary. After initialisation is ts_cache memory context 55MB long and pg takes 190MB. dispell_init cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used After dictionary loading cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (12 chunks); 19904424 used After AffFile loading cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After stop words loading cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After dictionary sort cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After Affixes sort cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used final cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
2010/3/11 Pavel Stehule pavel.steh...@gmail.com: Hello There are some wrong in our implementation NISortDictionary. After initialisation is ts_cache memory context 55MB long and pg takes 190MB. dispell_init cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used After dictionary loading cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (12 chunks); 19904424 used After AffFile loading cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After stop words loading cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After dictionary sort cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After Affixes sort cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used final cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used the mkSPNode takes 45MB Conf-Dictionary = mkSPNode(Conf, 0, Conf-nspell, 0); Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
Pavel Stehule pavel.steh...@gmail.com writes: There are some wrong in our implementation NISortDictionary. After initialisation is ts_cache memory context 55MB long and pg takes 190MB. What's your tsearch configuration exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
2010/3/11 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: There are some wrong in our implementation NISortDictionary. After initialisation is ts_cache memory context 55MB long and pg takes 190MB. What's your tsearch configuration exactly? files: http://www.pgsql.cz/data/czech.tar.gz configuration: CREATE TEXT SEARCH DICTIONARY cspell (template=ispell, dictfile = czech, afffile=czech, stopwords=czech); CREATE TEXT SEARCH CONFIGURATION cs (copy=english); ALTER TEXT SEARCH CONFIGURATION cs ALTER MAPPING FOR word, asciiword WITH cspell, simple; then try: select * from ts_debug('cs','Příliš žluťoučký kůň se napil žluté vody'); with some time (used fce clock()) cspell: 1024 total in 1 blocks; 136 free (1 chunks); 888 used After dictionary loading 32 cspell: 3072 total in 2 blocks; 568 free (5 chunks); 2504 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (12 chunks); 19904424 used After AffFile loading 33 cspell: 816952 total in 78 blocks; 18072 free (18 chunks); 798880 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After stop words loading 33 cspell: 816952 total in 78 blocks; 13360 free (13 chunks); 803592 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used ** 1 ** cspell: 816952 total in 78 blocks; 9240 free (12 chunks); 807712 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used ** 2 ** 38 cspell: 825144 total in 79 blocks; 8440 free (10 chunks); 816704 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used ** 2.5 ** 49 // mkSPNode cspell: 825144 total in 79 blocks; 8440 free (10 chunks); 816704 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used ** 3 ** 58 cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After dictionary sort 58 cspell: 55706480 total in 6775 blocks; 140200 free (1728 chunks); 55566280 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (20 chunks); 19904424 used After Affixes sort 58 cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used final 58 cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used executor start regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
2010/3/11 Pavel Stehule pavel.steh...@gmail.com: 2010/3/11 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: There are some wrong in our implementation NISortDictionary. After initialisation is ts_cache memory context 55MB long and pg takes 190MB. What's your tsearch configuration exactly? I have a 64bit Linux. The problem is in very large small allocations - there are 853215 nodes. The memory can be minimalized with some block allocations static void. binit(void) { --data = NULL; --allocated = 0; } static char * balloc(size_t size) { --char *result; -- --if (data == NULL || size allocated ) --{ data = palloc(1024 * 100); allocated = 1024 * 100; --} -- --result = data; --data += size; --allocated -= size; --memset(result, 0, size); -- --return result; } I replaced palloc0 inside mkSPnode by balloc cspell: 25626352 total in 349 blocks; 11048 free (2 chunks); 25615304 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used versus cspell: 55853736 total in 6789 blocks; 130208 free (1553 chunks); 55723528 used Ispell dictionary init context: 27615288 total in 13 blocks; 7710864 free (34 chunks); 19904424 used Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
Pavel Stehule pavel.steh...@gmail.com writes: The problem is in very large small allocations - there are 853215 nodes. I replaced palloc0 inside mkSPnode by balloc This goes back to the idea we've discussed from time to time of having a variant memory context type in which pfree() is a no-op and we dispense with all the per-chunk overhead. I guess that if there really isn't any overhead there then pfree/repalloc would actually crash :-( but for the particular case of dictionaries that would probably be OK because there's so little code that touches them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
2010/3/11 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: The problem is in very large small allocations - there are 853215 nodes. I replaced palloc0 inside mkSPnode by balloc This goes back to the idea we've discussed from time to time of having a variant memory context type in which pfree() is a no-op and we dispense with all the per-chunk overhead. I guess that if there really isn't any overhead there then pfree/repalloc would actually crash :-( but for the particular case of dictionaries that would probably be OK because there's so little code that touches them. it has a sense. I was surprised how much memory is necessary :(. Some smarter allocation save 50% - 2.5G for 100 users, what is important, but I thing, so these data has to be shared. I believed to preloading, but it is problematic - there are no data in shared preload time, and the allocated size is too big. Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
Pavel Stehule escribió: 2010/3/11 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: The problem is in very large small allocations - there are 853215 nodes. I replaced palloc0 inside mkSPnode by balloc This goes back to the idea we've discussed from time to time of having a variant memory context type in which pfree() is a no-op and we dispense with all the per-chunk overhead. I guess that if there really isn't any overhead there then pfree/repalloc would actually crash :-( but for the particular case of dictionaries that would probably be OK because there's so little code that touches them. it has a sense. I was surprised how much memory is necessary :(. Some smarter allocation save 50% - 2.5G for 100 users, what is important, but I thing, so these data has to be shared. I believed to preloading, but it is problematic - there are no data in shared preload time, and the allocated size is too big. Could it be mmapped and shared that way? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsearch profiling - czech environment - take 55MB
2010/3/11 Alvaro Herrera alvhe...@commandprompt.com: Pavel Stehule escribió: 2010/3/11 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: The problem is in very large small allocations - there are 853215 nodes. I replaced palloc0 inside mkSPnode by balloc This goes back to the idea we've discussed from time to time of having a variant memory context type in which pfree() is a no-op and we dispense with all the per-chunk overhead. I guess that if there really isn't any overhead there then pfree/repalloc would actually crash :-( but for the particular case of dictionaries that would probably be OK because there's so little code that touches them. it has a sense. I was surprised how much memory is necessary :(. Some smarter allocation save 50% - 2.5G for 100 users, what is important, but I thing, so these data has to be shared. I believed to preloading, but it is problematic - there are no data in shared preload time, and the allocated size is too big. Could it be mmapped and shared that way? I don't know - I newer worked with mmap. Pavel -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers