Re: [PERFORM] Postgres array parser
Hello, For such cases (see below), it would be nice to have an unnest function that only affect the first array dimension. Something like unnest(ARRAY[[1,2],[2,3]], SLICE=1) => unnest -- [1,2] [2,3] With this function, I imagine that following sql function might beat the plpgsql FOREACH version. CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[], in_input_nr numeric) RETURNS numeric AS $BODY$ SELECT u[1][2] FROM unnest($1, SLICE =1) u WHERE u[1][1]=in_input_nr LIMIT 1; $BODY$ LANGUAGE sql IMMUTABLE; best regards, Marc Mamin > -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Pavel Stehule > Sent: Dienstag, 13. Dezember 2011 15:43 > To: Aleksej Trofimov > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Postgres array parser > > Hello > > 2011/12/13 Aleksej Trofimov : > > We have tried foreach syntax, but we have noticed performance > degradation: > > Function with for: 203ms > > Function with foreach: ~250ms: > > > > there is functions code: > > CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[], > in_input_nr > > numeric) > > RETURNS numeric AS > > $BODY$ > > declare i numeric[]; > > BEGIN > > FOREACH i SLICE 1 IN ARRAY in_inputs > > LOOP > > if i[1] = in_input_nr then > > return i[2]; > > end if; > > END LOOP; > > > > return null; > > END; > > $BODY$ > > LANGUAGE plpgsql VOLATILE > > COST 100; > > > > CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[], > in_input_nr > > numeric) > > RETURNS numeric AS > > $BODY$ > > declare > > size int; > > BEGIN > > size = array_upper(in_inputs, 1); > > IF size IS NOT NULL THEN > > > > FOR i IN 1 .. size LOOP > > if in_inputs[i][1] = in_input_nr then > > return in_inputs[i][2]; > > end if; > > END LOOP; > > END IF; > > > > return null; > > END; > > $BODY$ > > LANGUAGE plpgsql VOLATILE > > COST 100; > > > > > > On 12/13/2011 04:02 PM, Pavel Stehule wrote: > >> > >> Hello > >> > >> do you know FOREACH IN ARRAY statement in 9.1 > >> > >> this significantly accelerate iteration over array > >> > >> > >> http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-foreach- > in-array/ > >> > >> > >> > >> 2011/12/13 Aleksej Trofimov: > >>> > >>> Hello, I wanted to ask according such a problem which we had faced > with. > >>> We are widely using postgres arrays like key->value array by doing > like > >>> this: > >>> > >>> {{1,5},{2,6},{3,7}} > >>> > >>> where 1,2,3 are keys, and 5,6,7 are values. In our pgSql functions > we are > >>> using self written array_input(array::numeric[], key::numeric) > function > >>> which makes a loop on whole array and searches for key like > >>> FOR i IN 1 .. size LOOP > >>> if array[i][1] = key then > >>> return array[i][2]; > >>> end if; > >>> END LOOP; > >>> > >>> But this was a good solution until our arrays and database had > grown. So > >>> now > >>> FOR loop takes a lot of time to find value of an array. > >>> > >>> And my question is, how this problem of performance could be > solved? We > >>> had > >>> tried pgperl for string parsing, but it takes much more time than > our > >>> current solution. Also we are thinking about self-written C++ > function, > >>> may > >>> be someone had implemented this algorithm before? > >>> > >> you can use indexes or you can use hstore > >> > >> Regards > >> > >> Pavel Stehule > >> > >>> -- > >>> Best regards > >>> > >>> Aleksej Trofimov > >>> > >>> > >>> -- > >>> Sent via pgsql-performance mailing list > >>> (pgsql-performance@postgresql.org) > >>> To make changes to your subscription: > >>> http://www.postgresql.org/mailpref/pgsql-performance > > > > > > It is strange - on my comp FOREACH is about 2x faster > > postgres=# select input_value(array(select > generate_series(1,100)::numeric), 10); > input_value > - > > (1 row) > > Time: 495.426 ms > > postgres=# select input_value_fe(array(select > generate_series(1,100)::numeric), 10); > input_value_fe > > > (1 row) > > Time: 248.980 ms > > Regards > > Pavel > > > > > > -- > > Best regards > > > > Aleksej Trofimov > > > > UAB "Ruptela" > > > > Phone: +370 657 80475 > > > > E-Mail: aleksej.trofi...@ruptela.lt > > Web: http://www.ruptela.lt > > > > Ruptela - the most successful IT company in Lithuania 2011 > > Ruptela - sekmingiausia Lietuvos aukštųjų technologijų įmonė 2011 > > > http://www.prezidentas.lt/lt/spaudos_centras_392/pranesimai_spaudai/ino > vatyvus_verslas_-_konkurencingos_lietuvos_pagrindas.html > > http://www.ruptela.lt/news/37/121/Ruptela-sekmingiausia-jauna- > aukstuju-technologiju-imone-Lietuvoje > > > > -- > Sent via pgsql-performance mailing list (pgsql- > performa...@postgresql.org) > To make changes to your subs
Re: [PERFORM] Postgres array parser
Yes, it would be great, but I haven't found such a function, which splits 2 dimensional array into rows =) Maybe we'll modify existing function, but unfortunately we have tried hstore type and function in postgres and we see a significant performance improvements. So we only need to convert existing data into hstore and I think this is a good solution. On 12/14/2011 11:21 AM, Marc Mamin wrote: Hello, For such cases (see below), it would be nice to have an unnest function that only affect the first array dimension. Something like unnest(ARRAY[[1,2],[2,3]], SLICE=1) => unnest -- [1,2] [2,3] With this function, I imagine that following sql function might beat the plpgsql FOREACH version. CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[], in_input_nr numeric) RETURNS numeric AS $BODY$ SELECT u[1][2] FROM unnest($1, SLICE =1) u WHERE u[1][1]=in_input_nr LIMIT 1; $BODY$ LANGUAGE sql IMMUTABLE; best regards, Marc Mamin -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- ow...@postgresql.org] On Behalf Of Pavel Stehule Sent: Dienstag, 13. Dezember 2011 15:43 To: Aleksej Trofimov Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Postgres array parser Hello 2011/12/13 Aleksej Trofimov: We have tried foreach syntax, but we have noticed performance degradation: Function with for: 203ms Function with foreach: ~250ms: there is functions code: CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[], in_input_nr numeric) RETURNS numeric AS $BODY$ declare i numeric[]; BEGIN FOREACH i SLICE 1 IN ARRAY in_inputs LOOP if i[1] = in_input_nr then return i[2]; end if; END LOOP; return null; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[], in_input_nr numeric) RETURNS numeric AS $BODY$ declare size int; BEGIN size = array_upper(in_inputs, 1); IF size IS NOT NULL THEN FOR i IN 1 .. size LOOP if in_inputs[i][1] = in_input_nr then return in_inputs[i][2]; end if; END LOOP; END IF; return null; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; On 12/13/2011 04:02 PM, Pavel Stehule wrote: Hello do you know FOREACH IN ARRAY statement in 9.1 this significantly accelerate iteration over array http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-foreach- in-array/ 2011/12/13 Aleksej Trofimov: Hello, I wanted to ask according such a problem which we had faced with. We are widely using postgres arrays like key->value array by doing like this: {{1,5},{2,6},{3,7}} where 1,2,3 are keys, and 5,6,7 are values. In our pgSql functions we are using self written array_input(array::numeric[], key::numeric) function which makes a loop on whole array and searches for key like FOR i IN 1 .. size LOOP if array[i][1] = key then return array[i][2]; end if; END LOOP; But this was a good solution until our arrays and database had grown. So now FOR loop takes a lot of time to find value of an array. And my question is, how this problem of performance could be solved? We had tried pgperl for string parsing, but it takes much more time than our current solution. Also we are thinking about self-written C++ function, may be someone had implemented this algorithm before? you can use indexes or you can use hstore Regards Pavel Stehule -- Best regards Aleksej Trofimov -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance It is strange - on my comp FOREACH is about 2x faster postgres=# select input_value(array(select generate_series(1,100)::numeric), 10); input_value - (1 row) Time: 495.426 ms postgres=# select input_value_fe(array(select generate_series(1,100)::numeric), 10); input_value_fe (1 row) Time: 248.980 ms Regards Pavel -- Sent via pgsql-performance mailing list (pgsql- performa...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Best regards Aleksej Trofimov -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres array parser
> Yes, it would be great, but I haven't found such a function, which > splits 2 dimensional array into rows =) Maybe we'll modify existing > function, but unfortunately we have tried hstore type and function in > postgres and we see a significant performance improvements. So we only > need to convert existing data into hstore and I think this is a good > solution. I haven't tested hstore yet, but I would be interested to find out if it still better perform with custom "numeric" aggregates on the hstore values. I've made a short "proof of concept" test with a custom key/value type to achieve such an aggregation. Something like: SELECT x, distinct_sum( (currency,amount)::keyval ) overview FROM ... GROUP BY x x currency amount a EUR 15.0 a EUR5.0 a CHF7.5 b USD 12.0 => x overview - a {(EUR,20.0), (CHF,7.5)} b {(USD,10.0)} regards, Marc Mamin > On 12/14/2011 11:21 AM, Marc Mamin wrote: > > Hello, > > > > For such cases (see below), it would be nice to have an unnest > function that only affect the first array dimension. > > > > Something like > > > > unnest(ARRAY[[1,2],[2,3]], SLICE=1) > > => > > unnest > > -- > > [1,2] > > [2,3] > > > > > > With this function, I imagine that following sql function > > might beat the plpgsql FOREACH version. > > > > > > CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[], > in_input_nr numeric) > > RETURNS numeric AS > > $BODY$ > > > >SELECT u[1][2] > >FROM unnest($1, SLICE =1) u > >WHERE u[1][1]=in_input_nr > >LIMIT 1; > > > > $BODY$ > > LANGUAGE sql IMMUTABLE; > > > > > > > > best regards, > > > > Marc Mamin > > > > > >> -Original Message- > >> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- > performance- > >> ow...@postgresql.org] On Behalf Of Pavel Stehule > >> Sent: Dienstag, 13. Dezember 2011 15:43 > >> To: Aleksej Trofimov > >> Cc: pgsql-performance@postgresql.org > >> Subject: Re: [PERFORM] Postgres array parser > >> > >> Hello > >> > >> 2011/12/13 Aleksej Trofimov: > >>> We have tried foreach syntax, but we have noticed performance > >> degradation: > >>> Function with for: 203ms > >>> Function with foreach: ~250ms: > >>> > >>> there is functions code: > >>> CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[], > >> in_input_nr > >>> numeric) > >>> RETURNS numeric AS > >>> $BODY$ > >>> declare i numeric[]; > >>> BEGIN > >>> FOREACH i SLICE 1 IN ARRAY in_inputs > >>> LOOP > >>> if i[1] = in_input_nr then > >>> return i[2]; > >>> end if; > >>> END LOOP; > >>> > >>> return null; > >>> END; > >>> $BODY$ > >>> LANGUAGE plpgsql VOLATILE > >>> COST 100; > >>> > >>> CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[], > >> in_input_nr > >>> numeric) > >>> RETURNS numeric AS > >>> $BODY$ > >>> declare > >>> size int; > >>> BEGIN > >>> size = array_upper(in_inputs, 1); > >>> IF size IS NOT NULL THEN > >>> > >>> FOR i IN 1 .. size LOOP > >>> if in_inputs[i][1] = in_input_nr then > >>> return in_inputs[i][2]; > >>> end if; > >>> END LOOP; > >>> END IF; > >>> > >>> return null; > >>> END; > >>> $BODY$ > >>> LANGUAGE plpgsql VOLATILE > >>> COST 100; > >>> > >>> > >>> On 12/13/2011 04:02 PM, Pavel Stehule wrote: > Hello > > do you know FOREACH IN ARRAY statement in 9.1 > > this significantly accelerate iteration over array > > > http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1- > foreach- > >> in-array/ > > > 2011/12/13 Aleksej Trofimov: > > Hello, I wanted to ask according such a problem which we had > faced > >> with. > > We are widely using postgres arrays like key->value array by > doing > >> like > > this: > > > > {{1,5},{2,6},{3,7}} > > > > where 1,2,3 are keys, and 5,6,7 are values. In our pgSql > functions > >> we are > > using self written array_input(array::numeric[], key::numeric) > >> function > > which makes a loop on whole array and searches for key like > > FOR i IN 1 .. size LOOP > > if array[i][1] = key then > > return array[i][2]; > > end if; > > END LOOP; > > > > But this was a good solution until our arrays and database had > >> grown. So > > now > > FOR loop takes a lot of time to find value of an array. > > > > And my question is, how this problem of performance could be > >> solved? We > > had > > tried pgperl for string parsing, but it takes much more time than > >> our > > current solution. Also we are thinking about self-written C++ > >> function, > > may > > be someone had implemented this algorithm before? > > > you can use indexes or you can use hstore > > Regards > > >>>
Re: [PERFORM] copy vs. C function
try host all all 5.0.0.0/8 md5 On Wed, Dec 14, 2011 at 2:02 AM, idc danny wrote: > Hi guys, > A nub question here since I could not figure it out on my own: > I'm using Hamachi to connect different sites into a VPN and their address > always starts with 5.*.*.* - the problem I'm facing is that I cannot make > the access restricted to that particular range only. > Currently I got : > host all all 0.0.0.0/32 md5 > which allows all the IP's, and the try: > host all all 5.0.0.0/32 md5 > does not work. > So what I am suppose to add in "pg_hba.conf" in order to achieve my > restriction? Please help me, > Thank you, > Danny > > -- > *From:* Tom Lane > *To:* Jon Nelson > *Cc:* pgsql-performance@postgresql.org > *Sent:* Wednesday, December 14, 2011 8:18 AM > *Subject:* Re: [PERFORM] copy vs. C function > > Jon Nelson writes: > > The only thing I have left are these statements: > > > get_call_result_type > > TupleDescGetAttInMetadata > > BuildTupleFromCStrings > > HeapTupleGetDatum > > and finally PG_RETURN_DATUM > > > It turns out that: > > get_call_result_type adds 43 seconds [total: 54], > > TupleDescGetAttInMetadata adds 19 seconds [total: 73], > > BuildTypleFromCStrings accounts for 43 seconds [total: 116]. > > > So those three functions account for 90% of the total time spent. > > What alternatives exist? Do I have to call get_call_result_type /every > > time/ through the function? > > Well, if you're concerned about performance then I think you're going > about this in entirely the wrong way, because as far as I can tell from > this you're converting all the field values to text and back again. > You should be trying to keep the values in Datum format and then > invoking heap_form_tuple. And yeah, you probably could cache the > type information across calls. > > regards, tom lane > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > > > -- Kevin P Martyn, CISSP Principal Sales Engineer (914) 819 8795 mobile kevin.mar...@enterprisedb.com Skype: kevin.martyn4 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Re: [PERFORM] copy vs. C function
On Wed, Dec 14, 2011 at 12:18 AM, Tom Lane wrote: > Jon Nelson writes: >> The only thing I have left are these statements: > >> get_call_result_type >> TupleDescGetAttInMetadata >> BuildTupleFromCStrings >> HeapTupleGetDatum >> and finally PG_RETURN_DATUM > >> It turns out that: >> get_call_result_type adds 43 seconds [total: 54], >> TupleDescGetAttInMetadata adds 19 seconds [total: 73], >> BuildTypleFromCStrings accounts for 43 seconds [total: 116]. > >> So those three functions account for 90% of the total time spent. >> What alternatives exist? Do I have to call get_call_result_type /every >> time/ through the function? > > Well, if you're concerned about performance then I think you're going > about this in entirely the wrong way, because as far as I can tell from > this you're converting all the field values to text and back again. > You should be trying to keep the values in Datum format and then > invoking heap_form_tuple. And yeah, you probably could cache the > type information across calls. The parsing/conversion (except BuildTupleFromCStrings) is only a small fraction of the overall time spent in the function and could probably be made slightly faster. It's the overhead that's killing me. Remember: I'm not converting multiple field values to text and back again, I'm turning a *single* TEXT into 8 columns of varying types (INET, INTEGER, and one INTEGER array, among others). I'll re-write the code to use Tuples but given that 53% of the time is spent in just two functions (the two I'd like to cache) I'm not sure how much of a gain it's likely to be. Regarding caching, I tried caching it across calls by making the TupleDesc static and only initializing it once. When I tried that, I got: ERROR: number of columns (6769856) exceeds limit (1664) I tried to find some documentation or examples that cache the information, but couldn't find any. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] copy vs. C function
Ah, that did the trick, thank you Kevin, Danny From: Kevin Martyn To: idc danny Cc: "pgsql-performance@postgresql.org" Sent: Wednesday, December 14, 2011 3:14 PM Subject: Re: [PERFORM] copy vs. C function try host all all 5.0.0.0/8 md5 On Wed, Dec 14, 2011 at 2:02 AM, idc danny wrote: Hi guys, >A nub question here since I could not figure it out on my own: >I'm using Hamachi to connect different sites into a VPN and their address >always starts with 5.*.*.* - the problem I'm facing is that I cannot make the >access restricted to that particular range only. >Currently I got : >host all all 0.0.0.0/32 md5 >which allows all the IP's, and the try: >host all all 5.0.0.0/32 md5 > >does not work. >So what I am suppose to add in "pg_hba.conf" in order to achieve my >restriction? Please help me, >Thank you, >Danny > > > > > > From: Tom Lane >To: Jon Nelson >Cc: pgsql-performance@postgresql.org >Sent: Wednesday, December 14, 2011 8:18 AM >Subject: Re: [PERFORM] copy vs. C function > >Jon Nelson writes: >> The only thing I have left are these statements: > >> get_call_result_type >> TupleDescGetAttInMetadata >> BuildTupleFromCStrings >> HeapTupleGetDatum >> and finally PG_RETURN_DATUM > >> It turns out that: >> get_call_result_type adds 43 seconds [total: 54], >> TupleDescGetAttInMetadata adds 19 seconds [total: 73], >> BuildTypleFromCStrings accounts for 43 seconds [total: 116]. > >> So those three functions account for 90% of the total time spent. >> What alternatives exist? Do I have to call get_call_result_type /every >> time/ through the function? > >Well, if you're concerned about performance then I think you're going >about this in entirely the wrong way, because as far as I can tell from >this you're converting all the field values to text and back again. >You should be trying to keep the values in Datum format and then >invoking heap_form_tuple. And yeah, you probably could cache the >type information across calls. > > regards, tom lane > >-- >Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-performance > > > -- Kevin P Martyn, CISSP Principal Sales Engineer (914) 819 8795 mobile kevin.mar...@enterprisedb.com Skype: kevin.martyn4 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Re: [PERFORM] copy vs. C function
Jon Nelson writes: > Regarding caching, I tried caching it across calls by making the > TupleDesc static and only initializing it once. > When I tried that, I got: > ERROR: number of columns (6769856) exceeds limit (1664) > I tried to find some documentation or examples that cache the > information, but couldn't find any. You might find reading record_in to be helpful. What it caches is not exactly what you need to, I think, but it shows the general principles. There are lots of other functions that use fn_extra to cache info, too. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] copy vs. C function
On Wed, Dec 14, 2011 at 9:25 AM, Tom Lane wrote: > Jon Nelson writes: >> Regarding caching, I tried caching it across calls by making the >> TupleDesc static and only initializing it once. >> When I tried that, I got: > >> ERROR: number of columns (6769856) exceeds limit (1664) > >> I tried to find some documentation or examples that cache the >> information, but couldn't find any. > > You might find reading record_in to be helpful. What it caches is not > exactly what you need to, I think, but it shows the general principles. > There are lots of other functions that use fn_extra to cache info, too. I will definitely look into those. I'm probably doing it wrong, but in the meantime, I allocated enough space (by way of MemoryContextAlloc) in TopMemoryContext for an AttInMetadata pointer, switched to that memory context (just the first time through), used CreateTupleDescCopy + TupleDescGetAttInMetadata to duplicate (in the new memory context) the TupleDesc, and then switched back. This approach seems to have dropped the total run time to about 54 seconds, the bulk of which is BuildTupleFromCStrings, a rather significant improvement. Looking at record_in, I think I see what I could be doing better. Again, thanks for the pointers. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] copy vs. C function
On Wed, Dec 14, 2011 at 9:40 AM, Jon Nelson wrote: > On Wed, Dec 14, 2011 at 9:25 AM, Tom Lane wrote: >> Jon Nelson writes: >>> Regarding caching, I tried caching it across calls by making the >>> TupleDesc static and only initializing it once. >>> When I tried that, I got: >> >>> ERROR: number of columns (6769856) exceeds limit (1664) >> >>> I tried to find some documentation or examples that cache the >>> information, but couldn't find any. >> >> You might find reading record_in to be helpful. What it caches is not >> exactly what you need to, I think, but it shows the general principles. >> There are lots of other functions that use fn_extra to cache info, too. > > I will definitely look into those. I'm probably doing it wrong, but in > the meantime, I allocated enough space (by way of MemoryContextAlloc) > in TopMemoryContext for an AttInMetadata pointer, switched to that > memory context (just the first time through), used CreateTupleDescCopy > + TupleDescGetAttInMetadata to duplicate (in the new memory context) > the TupleDesc, and then switched back. This approach seems to have > dropped the total run time to about 54 seconds, the bulk of which is > BuildTupleFromCStrings, a rather significant improvement. > > > > Looking at record_in, I think I see what I could be doing better. Indeed. I revised the code to make use of fcinfo->flinfo->fn_extra for storage and fcinfo->flinfo->fn_mcxt for the MemoryContext and everything seemed to work just fine. Assuming one *starts* with a char *some_var[8], would building Datum myself be faster than using BuildTupleFromCStrings? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Is it possible to use index on column for regexp match operator '~'?
for example, the where condition is: where '' ~ col1. I created a normal index on col1 but seems it is not used. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions and joins lead to index lookups on all partitions
Back again, I did some tests with our test machine, having a difficult query doing some fancy stuff ;) I made two versions, one using partitioned data, one, using unpartitioned data, both having the same equivalent indexes. It's using two of those big tables, one 28GB data and 17GB index, one 25GB data and 41GB indexes (both for the unpartitioned versions). Our test machine has 32GB of memory, short config: maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 effective_cache_size = 22GB work_mem = 80MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 7680MB max_connections = 400 At first I tested the query performance. It turned out that the unpartitioned version was about 36 times faster, of course for the obvious reason stated in my initial post. both are fully using the indexes they have, and the partitioned version even has it's indexes on SSD. Then I did some insert tests using generate_series to insert 10 rows into one of the tables. It turns out that the unpartitioned version is again faster, this time 30.9 vs 1.8 seconds. This is a huge difference. For the second table, with the huge 41GB index it's 30.5 vs 5.2 seconds, still a big difference. Conclusion: partitioning does not benefit us, and probably others, specially when doing lots of joins and using parameterized queries. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Partitions-and-joins-lead-to-index-lookups-on-all-partitions-tp5055965p5074907.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query after upgrade from 8.2 to 8.4
Hi, Thanks for Replay. Actually I finally find a solution. If I rewrite the query in this way: explain analyze select 1 from acc_clients AC, acc_debts AD, debts_desc DD, config CF where AC.ino = 204627 AND CF.id = (select id from config where confid=CF.confid AND office = 18 ORDER BY archived_at DESC LIMIT 1) AND AD.transact_no = AC.transact_no AND AD.debtid = DD.debtid AND DD.refid = CF.confid LIMIT 1; the plan and execution time really approves. http://explain.depesz.com/s/Nkj And for comparison I will repost the old way the query was written. explain analyze select 1 from acc_clients AC, acc_debts AD, debts_desc DD, config CF where AC.ino = 120 AND CF.id = (select id from config where confid=CF.confid ORDER BY archived_at DESC LIMIT 1) AND AD.transact_no = AC.transact_no AND AD.debtid = DD.debtid AND CF.office = 18 AND DD.refid = CF.confid LIMIT 1; This is the query plan of the upper query. http://explain.depesz.com/s/ATN When we have 8.4.9 installed I will try the query and post the result. Best regards, Kaloyan Iliev Mark Kirkwood wrote: On 10/12/11 04:30, Tom Lane wrote: However, it's not apparent to me why you would see any difference between 8.2 and 8.4 on this type of query. I tried a query analogous to this one on both, and got identical plans. I'm guessing that your slowdown is due to not having updated statistics on the new installation, or perhaps failing to duplicate some relevant settings. I notice he has 8.4.*8*... I wonder if he's running into the poor estimation bug for sub-selects/semi joins that was fixed in 8.4.9. Kaloyan, can you try the query in 8.4.9? regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?
2011/12/14 Rural Hunter : > for example, the where condition is: where '' ~ col1. I created a normal > index on col1 but seems it is not used. I assume you want to search values that match one particular pattern, that would be col1 ~ '' The answer is, only very simple patterns that start with '^'. Note that you MUST use the text_pattern_ops index opclass: # create table words (word text); # copy words from '/usr/share/dict/words'; # create index on words (word text_pattern_ops); # explain select * from words where word ~ '^post'; Index Scan using words_word_idx on words (cost=0.00..8.28 rows=10 width=9) Index Cond: ((word ~>=~ 'post'::text) AND (word ~<~ 'posu'::text)) Filter: (word ~ '^post'::text) If you just want to search for arbitrary strings, in PostgreSQL 9.1+ you can use pg_trgm extension with a LIKE expression: # create extension pg_trgm; # create index on words using gist (word gist_trgm_ops); # explain select * from words where word like '%post%'; Bitmap Heap Scan on words (cost=4.36..40.23 rows=10 width=9) Recheck Cond: (word ~~ '%post%'::text) -> Bitmap Index Scan on words_word_idx1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (word ~~ '%post%'::text) There's also the "wildspeed" external module which is somewhat faster at this: http://www.sai.msu.su/~megera/wiki/wildspeed And someone is working to get pg_trgm support for arbitrary regular expression searches. This *may* become part of the next major PostgreSQL release (9.2) http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?
actually I stored the pattern in col1. I want to get the row whose col1 pattern matches one string 'aaa'. 于2011年12月15日 4:43:37,Marti Raudsepp写到: 2011/12/14 Rural Hunter: for example, the where condition is: where '' ~ col1. I created a normal index on col1 but seems it is not used. I assume you want to search values that match one particular pattern, that would be col1 ~ '' The answer is, only very simple patterns that start with '^'. Note that you MUST use the text_pattern_ops index opclass: # create table words (word text); # copy words from '/usr/share/dict/words'; # create index on words (word text_pattern_ops); # explain select * from words where word ~ '^post'; Index Scan using words_word_idx on words (cost=0.00..8.28 rows=10 width=9) Index Cond: ((word ~>=~ 'post'::text) AND (word ~<~ 'posu'::text)) Filter: (word ~ '^post'::text) If you just want to search for arbitrary strings, in PostgreSQL 9.1+ you can use pg_trgm extension with a LIKE expression: # create extension pg_trgm; # create index on words using gist (word gist_trgm_ops); # explain select * from words where word like '%post%'; Bitmap Heap Scan on words (cost=4.36..40.23 rows=10 width=9) Recheck Cond: (word ~~ '%post%'::text) -> Bitmap Index Scan on words_word_idx1 (cost=0.00..4.36 rows=10 width=0) Index Cond: (word ~~ '%post%'::text) There's also the "wildspeed" external module which is somewhat faster at this: http://www.sai.msu.su/~megera/wiki/wildspeed And someone is working to get pg_trgm support for arbitrary regular expression searches. This *may* become part of the next major PostgreSQL release (9.2) http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] copy vs. C function
On Wed, Dec 14, 2011 at 9:51 AM, Jon Nelson wrote: > On Wed, Dec 14, 2011 at 9:40 AM, Jon Nelson wrote: >> On Wed, Dec 14, 2011 at 9:25 AM, Tom Lane wrote: >>> Jon Nelson writes: Regarding caching, I tried caching it across calls by making the TupleDesc static and only initializing it once. When I tried that, I got: >>> ERROR: number of columns (6769856) exceeds limit (1664) >>> I tried to find some documentation or examples that cache the information, but couldn't find any. >>> >>> You might find reading record_in to be helpful. What it caches is not >>> exactly what you need to, I think, but it shows the general principles. >>> There are lots of other functions that use fn_extra to cache info, too. >> >> I will definitely look into those. I'm probably doing it wrong, but in >> the meantime, I allocated enough space (by way of MemoryContextAlloc) >> in TopMemoryContext for an AttInMetadata pointer, switched to that >> memory context (just the first time through), used CreateTupleDescCopy >> + TupleDescGetAttInMetadata to duplicate (in the new memory context) >> the TupleDesc, and then switched back. This approach seems to have >> dropped the total run time to about 54 seconds, the bulk of which is >> BuildTupleFromCStrings, a rather significant improvement. >> >> >> >> Looking at record_in, I think I see what I could be doing better. > > Indeed. I revised the code to make use of fcinfo->flinfo->fn_extra for > storage and fcinfo->flinfo->fn_mcxt for the MemoryContext and > everything seemed to work just fine. > > Assuming one *starts* with a char *some_var[8], would building Datum > myself be faster than using BuildTupleFromCStrings? The answer is: yes. At least, in my case it is. The total run time is now down to about 32 seconds. Versus the BuildTupleFromCStrings which takes about 54 seconds. 32 seconds is more than 10-15 seconds, but it's livable. This experiment has been very worthwhile - thank you all for the help. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance