Re: [PERFORM] Postgres array parser

2011-12-14 Thread Marc Mamin
Hello,

For such cases (see below), it would be nice to have an unnest function that 
only affect the first array dimension.

Something like 

unnest(ARRAY[[1,2],[2,3]], SLICE=1)
=>
unnest
--
[1,2]
[2,3]


With this function, I imagine that following sql function
might beat the plpgsql FOREACH version. 


CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[], in_input_nr 
numeric)
   RETURNS numeric AS
$BODY$
   
  SELECT u[1][2]
  FROM unnest($1, SLICE =1) u
  WHERE u[1][1]=in_input_nr
  LIMIT 1;

$BODY$
   LANGUAGE sql IMMUTABLE;

   
   
best regards,

Marc Mamin
   

> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Pavel Stehule
> Sent: Dienstag, 13. Dezember 2011 15:43
> To: Aleksej Trofimov
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Postgres array parser
> 
> Hello
> 
> 2011/12/13 Aleksej Trofimov :
> > We have tried foreach syntax, but we have noticed performance
> degradation:
> > Function with for: 203ms
> > Function with foreach: ~250ms:
> >
> > there is functions code:
> > CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[],
> in_input_nr
> > numeric)
> >  RETURNS numeric AS
> > $BODY$
> > declare i numeric[];
> > BEGIN
> >        FOREACH i SLICE 1 IN ARRAY in_inputs
> >            LOOP
> >                 if i[1] = in_input_nr then
> >                    return i[2];
> >                 end if;
> >            END LOOP;
> >
> >    return null;
> > END;
> > $BODY$
> >  LANGUAGE plpgsql VOLATILE
> >  COST 100;
> >
> > CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[],
> in_input_nr
> > numeric)
> >  RETURNS numeric AS
> > $BODY$
> > declare
> >  size int;
> > BEGIN
> >  size = array_upper(in_inputs, 1);
> >    IF size IS NOT NULL THEN
> >
> >        FOR i IN 1 .. size LOOP
> >            if in_inputs[i][1] = in_input_nr then
> >                return in_inputs[i][2];
> >            end if;
> >        END LOOP;
> >    END IF;
> >
> >    return null;
> > END;
> > $BODY$
> >  LANGUAGE plpgsql VOLATILE
> >  COST 100;
> >
> >
> > On 12/13/2011 04:02 PM, Pavel Stehule wrote:
> >>
> >> Hello
> >>
> >> do you know FOREACH IN ARRAY statement in 9.1
> >>
> >> this significantly accelerate iteration over array
> >>
> >>
> >> http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-foreach-
> in-array/
> >>
> >>
> >>
> >> 2011/12/13 Aleksej Trofimov:
> >>>
> >>> Hello, I wanted to ask according such a problem which we had faced
> with.
> >>> We are widely using postgres arrays like key->value array by doing
> like
> >>> this:
> >>>
> >>> {{1,5},{2,6},{3,7}}
> >>>
> >>> where 1,2,3 are keys, and 5,6,7 are values. In our pgSql functions
> we are
> >>> using self written array_input(array::numeric[], key::numeric)
> function
> >>> which makes a loop on whole array and searches for key like
> >>> FOR i IN 1 .. size LOOP
> >>>            if array[i][1] = key then
> >>>                return array[i][2];
> >>>            end if;
> >>> END LOOP;
> >>>
> >>> But this was a good solution until our arrays and database had
> grown. So
> >>> now
> >>> FOR loop takes a lot of time to find value of an array.
> >>>
> >>> And my question is, how this problem of performance could be
> solved? We
> >>> had
> >>> tried pgperl for string parsing, but it takes much more time than
> our
> >>> current solution. Also we are thinking about self-written C++
> function,
> >>> may
> >>> be someone had implemented this algorithm before?
> >>>
> >> you can use indexes or you can use hstore
> >>
> >> Regards
> >>
> >> Pavel Stehule
> >>
> >>> --
> >>> Best regards
> >>>
> >>> Aleksej Trofimov
> >>>
> >>>
> >>> --
> >>> Sent via pgsql-performance mailing list
> >>> (pgsql-performance@postgresql.org)
> >>> To make changes to your subscription:
> >>> http://www.postgresql.org/mailpref/pgsql-performance
> >
> >
> 
> It is strange - on my comp FOREACH is about 2x faster
> 
> postgres=# select input_value(array(select
> generate_series(1,100)::numeric), 10);
>  input_value
> -
> 
> (1 row)
> 
> Time: 495.426 ms
> 
> postgres=# select input_value_fe(array(select
> generate_series(1,100)::numeric), 10);
>  input_value_fe
> 
> 
> (1 row)
> 
> Time: 248.980 ms
> 
> Regards
> 
> Pavel
> 
> 
> >
> > --
> > Best regards
> >
> > Aleksej Trofimov
> >
> > UAB "Ruptela"
> >
> > Phone: +370 657 80475
> >
> > E-Mail: aleksej.trofi...@ruptela.lt
> > Web:    http://www.ruptela.lt
> >
> > Ruptela - the most successful IT company in Lithuania 2011
> > Ruptela - sekmingiausia Lietuvos aukštųjų technologijų įmonė 2011
> >
> http://www.prezidentas.lt/lt/spaudos_centras_392/pranesimai_spaudai/ino
> vatyvus_verslas_-_konkurencingos_lietuvos_pagrindas.html
> > http://www.ruptela.lt/news/37/121/Ruptela-sekmingiausia-jauna-
> aukstuju-technologiju-imone-Lietuvoje
> >
> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subs

Re: [PERFORM] Postgres array parser

2011-12-14 Thread Aleksej Trofimov
Yes, it would be great, but I haven't found such a function, which 
splits 2 dimensional array into rows =) Maybe we'll modify existing 
function, but unfortunately we have tried hstore type and function in 
postgres and we see a significant performance improvements. So we only 
need to convert existing data into hstore and I think this is a good 
solution.


On 12/14/2011 11:21 AM, Marc Mamin wrote:

Hello,

For such cases (see below), it would be nice to have an unnest function that 
only affect the first array dimension.

Something like

unnest(ARRAY[[1,2],[2,3]], SLICE=1)
=>
unnest
--
[1,2]
[2,3]


With this function, I imagine that following sql function
might beat the plpgsql FOREACH version.


CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[], in_input_nr 
numeric)
RETURNS numeric AS
$BODY$

   SELECT u[1][2]
   FROM unnest($1, SLICE =1) u
   WHERE u[1][1]=in_input_nr
   LIMIT 1;

$BODY$
LANGUAGE sql IMMUTABLE;



best regards,

Marc Mamin



-Original Message-
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
ow...@postgresql.org] On Behalf Of Pavel Stehule
Sent: Dienstag, 13. Dezember 2011 15:43
To: Aleksej Trofimov
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgres array parser

Hello

2011/12/13 Aleksej Trofimov:

We have tried foreach syntax, but we have noticed performance

degradation:

Function with for: 203ms
Function with foreach: ~250ms:

there is functions code:
CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[],

in_input_nr

numeric)
  RETURNS numeric AS
$BODY$
declare i numeric[];
BEGIN
FOREACH i SLICE 1 IN ARRAY in_inputs
LOOP
 if i[1] = in_input_nr then
return i[2];
 end if;
END LOOP;

return null;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[],

in_input_nr

numeric)
  RETURNS numeric AS
$BODY$
declare
  size int;
BEGIN
  size = array_upper(in_inputs, 1);
IF size IS NOT NULL THEN

FOR i IN 1 .. size LOOP
if in_inputs[i][1] = in_input_nr then
return in_inputs[i][2];
end if;
END LOOP;
END IF;

return null;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;


On 12/13/2011 04:02 PM, Pavel Stehule wrote:

Hello

do you know FOREACH IN ARRAY statement in 9.1

this significantly accelerate iteration over array


http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-foreach-

in-array/



2011/12/13 Aleksej Trofimov:

Hello, I wanted to ask according such a problem which we had faced

with.

We are widely using postgres arrays like key->value array by doing

like

this:

{{1,5},{2,6},{3,7}}

where 1,2,3 are keys, and 5,6,7 are values. In our pgSql functions

we are

using self written array_input(array::numeric[], key::numeric)

function

which makes a loop on whole array and searches for key like
FOR i IN 1 .. size LOOP
if array[i][1] = key then
return array[i][2];
end if;
END LOOP;

But this was a good solution until our arrays and database had

grown. So

now
FOR loop takes a lot of time to find value of an array.

And my question is, how this problem of performance could be

solved? We

had
tried pgperl for string parsing, but it takes much more time than

our

current solution. Also we are thinking about self-written C++

function,

may
be someone had implemented this algorithm before?


you can use indexes or you can use hstore

Regards

Pavel Stehule


--
Best regards

Aleksej Trofimov


--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



It is strange - on my comp FOREACH is about 2x faster

postgres=# select input_value(array(select
generate_series(1,100)::numeric), 10);
  input_value
-

(1 row)

Time: 495.426 ms

postgres=# select input_value_fe(array(select
generate_series(1,100)::numeric), 10);
  input_value_fe


(1 row)

Time: 248.980 ms

Regards

Pavel


--
Sent via pgsql-performance mailing list (pgsql-
performa...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Best regards

Aleksej Trofimov


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres array parser

2011-12-14 Thread Marc Mamin
> Yes, it would be great, but I haven't found such a function, which
> splits 2 dimensional array into rows =) Maybe we'll modify existing
> function, but unfortunately we have tried hstore type and function in
> postgres and we see a significant performance improvements. So we only
> need to convert existing data into hstore and I think this is a good
> solution.



I haven't tested hstore yet, but I would be interested to find out if it still 
better perform with custom "numeric" aggregates on the hstore values.

I've made a short "proof of concept"  test with a custom key/value type to 
achieve such an aggregation.
Something like:


   SELECT x, distinct_sum( (currency,amount)::keyval ) overview  FROM ... GROUP 
BY x

   x currency amount
   a  EUR   15.0
   a  EUR5.0
   a  CHF7.5
   b  USD   12.0
   =>

   x  overview
   -  
   a {(EUR,20.0), (CHF,7.5)}
   b {(USD,10.0)}


regards,

Marc Mamin

 
> On 12/14/2011 11:21 AM, Marc Mamin wrote:
> > Hello,
> >
> > For such cases (see below), it would be nice to have an unnest
> function that only affect the first array dimension.
> >
> > Something like
> >
> > unnest(ARRAY[[1,2],[2,3]], SLICE=1)
> > =>
> > unnest
> > --
> > [1,2]
> > [2,3]
> >
> >
> > With this function, I imagine that following sql function
> > might beat the plpgsql FOREACH version.
> >
> >
> > CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[],
> in_input_nr numeric)
> > RETURNS numeric AS
> > $BODY$
> >
> >SELECT u[1][2]
> >FROM unnest($1, SLICE =1) u
> >WHERE u[1][1]=in_input_nr
> >LIMIT 1;
> >
> > $BODY$
> > LANGUAGE sql IMMUTABLE;
> >
> >
> >
> > best regards,
> >
> > Marc Mamin
> >
> >
> >> -Original Message-
> >> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
> performance-
> >> ow...@postgresql.org] On Behalf Of Pavel Stehule
> >> Sent: Dienstag, 13. Dezember 2011 15:43
> >> To: Aleksej Trofimov
> >> Cc: pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Postgres array parser
> >>
> >> Hello
> >>
> >> 2011/12/13 Aleksej Trofimov:
> >>> We have tried foreach syntax, but we have noticed performance
> >> degradation:
> >>> Function with for: 203ms
> >>> Function with foreach: ~250ms:
> >>>
> >>> there is functions code:
> >>> CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[],
> >> in_input_nr
> >>> numeric)
> >>>   RETURNS numeric AS
> >>> $BODY$
> >>> declare i numeric[];
> >>> BEGIN
> >>> FOREACH i SLICE 1 IN ARRAY in_inputs
> >>> LOOP
> >>>  if i[1] = in_input_nr then
> >>> return i[2];
> >>>  end if;
> >>> END LOOP;
> >>>
> >>> return null;
> >>> END;
> >>> $BODY$
> >>>   LANGUAGE plpgsql VOLATILE
> >>>   COST 100;
> >>>
> >>> CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[],
> >> in_input_nr
> >>> numeric)
> >>>   RETURNS numeric AS
> >>> $BODY$
> >>> declare
> >>>   size int;
> >>> BEGIN
> >>>   size = array_upper(in_inputs, 1);
> >>> IF size IS NOT NULL THEN
> >>>
> >>> FOR i IN 1 .. size LOOP
> >>> if in_inputs[i][1] = in_input_nr then
> >>> return in_inputs[i][2];
> >>> end if;
> >>> END LOOP;
> >>> END IF;
> >>>
> >>> return null;
> >>> END;
> >>> $BODY$
> >>>   LANGUAGE plpgsql VOLATILE
> >>>   COST 100;
> >>>
> >>>
> >>> On 12/13/2011 04:02 PM, Pavel Stehule wrote:
>  Hello
> 
>  do you know FOREACH IN ARRAY statement in 9.1
> 
>  this significantly accelerate iteration over array
> 
> 
>  http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-
> foreach-
> >> in-array/
> 
> 
>  2011/12/13 Aleksej Trofimov:
> > Hello, I wanted to ask according such a problem which we had
> faced
> >> with.
> > We are widely using postgres arrays like key->value array by
> doing
> >> like
> > this:
> >
> > {{1,5},{2,6},{3,7}}
> >
> > where 1,2,3 are keys, and 5,6,7 are values. In our pgSql
> functions
> >> we are
> > using self written array_input(array::numeric[], key::numeric)
> >> function
> > which makes a loop on whole array and searches for key like
> > FOR i IN 1 .. size LOOP
> > if array[i][1] = key then
> > return array[i][2];
> > end if;
> > END LOOP;
> >
> > But this was a good solution until our arrays and database had
> >> grown. So
> > now
> > FOR loop takes a lot of time to find value of an array.
> >
> > And my question is, how this problem of performance could be
> >> solved? We
> > had
> > tried pgperl for string parsing, but it takes much more time than
> >> our
> > current solution. Also we are thinking about self-written C++
> >> function,
> > may
> > be someone had implemented this algorithm before?
> >
>  you can use indexes or you can use hstore
> 
>  Regards
> 
> >>>

Re: [PERFORM] copy vs. C function

2011-12-14 Thread Kevin Martyn
try
host all all 5.0.0.0/8  md5

On Wed, Dec 14, 2011 at 2:02 AM, idc danny  wrote:

> Hi guys,
> A nub question here since I could not figure it out on my own:
> I'm using Hamachi to connect different sites into a VPN and their address
> always starts with 5.*.*.* - the problem I'm facing is that I cannot make
> the access restricted to that particular range only.
> Currently I got :
> host all all 0.0.0.0/32 md5
> which allows all the IP's, and the try:
> host all all 5.0.0.0/32 md5
> does not work.
> So what I am suppose to add in "pg_hba.conf" in order to achieve my
> restriction? Please help me,
> Thank you,
> Danny
>
>   --
> *From:* Tom Lane 
> *To:* Jon Nelson 
> *Cc:* pgsql-performance@postgresql.org
> *Sent:* Wednesday, December 14, 2011 8:18 AM
> *Subject:* Re: [PERFORM] copy vs. C function
>
> Jon Nelson  writes:
> > The only thing I have left are these statements:
>
> > get_call_result_type
> > TupleDescGetAttInMetadata
> > BuildTupleFromCStrings
> > HeapTupleGetDatum
> > and finally PG_RETURN_DATUM
>
> > It turns out that:
> > get_call_result_type adds 43 seconds [total: 54],
> > TupleDescGetAttInMetadata adds 19 seconds [total: 73],
> > BuildTypleFromCStrings accounts for 43 seconds [total: 116].
>
> > So those three functions account for 90% of the total time spent.
> > What alternatives exist? Do I have to call get_call_result_type /every
> > time/ through the function?
>
> Well, if you're concerned about performance then I think you're going
> about this in entirely the wrong way, because as far as I can tell from
> this you're converting all the field values to text and back again.
> You should be trying to keep the values in Datum format and then
> invoking heap_form_tuple.  And yeah, you probably could cache the
> type information across calls.
>
> regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>
>


-- 
Kevin P Martyn, CISSP
Principal Sales Engineer
(914) 819 8795 mobile
kevin.mar...@enterprisedb.com
Skype:   kevin.martyn4

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.


Re: [PERFORM] copy vs. C function

2011-12-14 Thread Jon Nelson
On Wed, Dec 14, 2011 at 12:18 AM, Tom Lane  wrote:
> Jon Nelson  writes:
>> The only thing I have left are these statements:
>
>> get_call_result_type
>> TupleDescGetAttInMetadata
>> BuildTupleFromCStrings
>> HeapTupleGetDatum
>> and finally PG_RETURN_DATUM
>
>> It turns out that:
>> get_call_result_type adds 43 seconds [total: 54],
>> TupleDescGetAttInMetadata adds 19 seconds [total: 73],
>> BuildTypleFromCStrings accounts for 43 seconds [total: 116].
>
>> So those three functions account for 90% of the total time spent.
>> What alternatives exist? Do I have to call get_call_result_type /every
>> time/ through the function?
>
> Well, if you're concerned about performance then I think you're going
> about this in entirely the wrong way, because as far as I can tell from
> this you're converting all the field values to text and back again.
> You should be trying to keep the values in Datum format and then
> invoking heap_form_tuple.  And yeah, you probably could cache the
> type information across calls.

The parsing/conversion (except BuildTupleFromCStrings) is only a small
fraction of the overall time spent in the function and could probably
be made slightly faster. It's the overhead that's killing me.

Remember: I'm not converting multiple field values to text and back
again, I'm turning a *single* TEXT into 8 columns of varying types
(INET, INTEGER, and one INTEGER array, among others).  I'll re-write
the code to use Tuples but given that 53% of the time is spent in just
two functions (the two I'd like to cache) I'm not sure how much of a
gain it's likely to be.

Regarding caching, I tried caching it across calls by making the
TupleDesc static and only initializing it once.
When I tried that, I got:

ERROR:  number of columns (6769856) exceeds limit (1664)

I tried to find some documentation or examples that cache the
information, but couldn't find any.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-14 Thread idc danny
Ah, that did the trick, thank you Kevin,
Danny




 From: Kevin Martyn 
To: idc danny  
Cc: "pgsql-performance@postgresql.org"  
Sent: Wednesday, December 14, 2011 3:14 PM
Subject: Re: [PERFORM] copy vs. C function
 

try
host all all 5.0.0.0/8  md5


On Wed, Dec 14, 2011 at 2:02 AM, idc danny  wrote:

Hi guys,
>A nub question here since I could not figure it out on my own:
>I'm using Hamachi to connect different sites into a VPN and their address 
>always starts with 5.*.*.* - the problem I'm facing is that I cannot make the 
>access restricted to that particular range only.
>Currently I got :
>host all all 0.0.0.0/32 md5
>which allows all the IP's, and the try:
>host all all 5.0.0.0/32 md5 
>
>does not work.
>So what I am suppose to add in "pg_hba.conf" in order to achieve my 
>restriction? Please help me,
>Thank you,
>Danny
>
>
>
>
>
> From: Tom Lane 
>To: Jon Nelson  
>Cc: pgsql-performance@postgresql.org 
>Sent: Wednesday, December 14, 2011 8:18 AM
>Subject: Re: [PERFORM] copy vs. C function 
> 
>Jon Nelson  writes:
>> The only thing I have left are these statements:
>
>> get_call_result_type
>> TupleDescGetAttInMetadata
>> BuildTupleFromCStrings
>> HeapTupleGetDatum
>> and finally PG_RETURN_DATUM
>
>> It turns out that:
>> get_call_result_type adds 43 seconds [total: 54],
>> TupleDescGetAttInMetadata adds 19 seconds [total: 73],
>> BuildTypleFromCStrings accounts for 43 seconds [total: 116].
>
>> So those three functions account for 90% of the total time spent.
>> What alternatives exist? Do I have to call get_call_result_type /every
>> time/ through the function?
>
>Well, if you're concerned about performance then I think you're going
>about this in entirely the wrong way, because as far as I can tell from
>this you're converting all the field values to text and
 back again.
>You should be trying to keep the values in Datum format and then
>invoking heap_form_tuple.  And yeah, you probably could cache the
>type information across calls.
>
>            regards, tom lane
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance
>
>
>


-- 
Kevin P Martyn, CISSP
Principal Sales Engineer
(914) 819 8795 mobile
kevin.mar...@enterprisedb.com
Skype:   kevin.martyn4

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This
 e-mail message (and any attachment) is intended for the use of the 
individual or entity to whom it is addressed. This message contains 
information from EnterpriseDB Corporation that may be privileged, 
confidential, or exempt from disclosure under applicable law. If you are
 not the intended recipient or authorized to receive this for the 
intended recipient, any use, dissemination, distribution, retention, 
archiving, or copying of this communication is strictly prohibited. If 
you have received this e-mail in error, please notify the sender 
immediately by reply e-mail and delete this message. 

Re: [PERFORM] copy vs. C function

2011-12-14 Thread Tom Lane
Jon Nelson  writes:
> Regarding caching, I tried caching it across calls by making the
> TupleDesc static and only initializing it once.
> When I tried that, I got:

> ERROR:  number of columns (6769856) exceeds limit (1664)

> I tried to find some documentation or examples that cache the
> information, but couldn't find any.

You might find reading record_in to be helpful.  What it caches is not
exactly what you need to, I think, but it shows the general principles.
There are lots of other functions that use fn_extra to cache info, too.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-14 Thread Jon Nelson
On Wed, Dec 14, 2011 at 9:25 AM, Tom Lane  wrote:
> Jon Nelson  writes:
>> Regarding caching, I tried caching it across calls by making the
>> TupleDesc static and only initializing it once.
>> When I tried that, I got:
>
>> ERROR:  number of columns (6769856) exceeds limit (1664)
>
>> I tried to find some documentation or examples that cache the
>> information, but couldn't find any.
>
> You might find reading record_in to be helpful.  What it caches is not
> exactly what you need to, I think, but it shows the general principles.
> There are lots of other functions that use fn_extra to cache info, too.

I will definitely look into those. I'm probably doing it wrong, but in
the meantime, I allocated enough space (by way of MemoryContextAlloc)
in TopMemoryContext for an AttInMetadata pointer, switched to that
memory context (just the first time through), used CreateTupleDescCopy
+ TupleDescGetAttInMetadata to duplicate (in the new memory context)
the TupleDesc, and then switched back. This approach seems to have
dropped the total run time to about 54 seconds, the bulk of which is
BuildTupleFromCStrings, a rather significant improvement.



Looking at record_in, I think I see what I could be doing better.

Again, thanks for the pointers.


-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-14 Thread Jon Nelson
On Wed, Dec 14, 2011 at 9:40 AM, Jon Nelson  wrote:
> On Wed, Dec 14, 2011 at 9:25 AM, Tom Lane  wrote:
>> Jon Nelson  writes:
>>> Regarding caching, I tried caching it across calls by making the
>>> TupleDesc static and only initializing it once.
>>> When I tried that, I got:
>>
>>> ERROR:  number of columns (6769856) exceeds limit (1664)
>>
>>> I tried to find some documentation or examples that cache the
>>> information, but couldn't find any.
>>
>> You might find reading record_in to be helpful.  What it caches is not
>> exactly what you need to, I think, but it shows the general principles.
>> There are lots of other functions that use fn_extra to cache info, too.
>
> I will definitely look into those. I'm probably doing it wrong, but in
> the meantime, I allocated enough space (by way of MemoryContextAlloc)
> in TopMemoryContext for an AttInMetadata pointer, switched to that
> memory context (just the first time through), used CreateTupleDescCopy
> + TupleDescGetAttInMetadata to duplicate (in the new memory context)
> the TupleDesc, and then switched back. This approach seems to have
> dropped the total run time to about 54 seconds, the bulk of which is
> BuildTupleFromCStrings, a rather significant improvement.
>
> 
>
> Looking at record_in, I think I see what I could be doing better.

Indeed. I revised the code to make use of fcinfo->flinfo->fn_extra for
storage and fcinfo->flinfo->fn_mcxt for the MemoryContext and
everything seemed to work just fine.

Assuming one *starts* with a char *some_var[8], would building Datum
myself be faster than using BuildTupleFromCStrings?

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Is it possible to use index on column for regexp match operator '~'?

2011-12-14 Thread Rural Hunter
for example, the where condition is: where '' ~ col1. I created a 
normal index on col1 but seems it is not used.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions and joins lead to index lookups on all partitions

2011-12-14 Thread voodooless
Back again,

I did some tests with our test machine, having a difficult query doing some
fancy stuff ;)

I made two versions, one using partitioned data, one, using unpartitioned
data, both having the same equivalent indexes. It's using two of those big
tables, one 28GB data and 17GB index, one 25GB data and 41GB indexes (both
for the unpartitioned versions). Our test machine has 32GB of memory, short
config:

maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
work_mem = 80MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7680MB
max_connections = 400

At first I tested the query performance. It turned out that the
unpartitioned version was about 36 times faster, of course for the obvious
reason stated in my initial post. both are fully using the indexes they
have, and the partitioned version even has it's indexes on SSD.

Then I did some insert tests using generate_series to insert 10 rows
into one of the tables. It turns out that the unpartitioned version is again
faster, this time 30.9 vs 1.8 seconds. This is a huge difference. For the
second table, with the huge 41GB index it's 30.5 vs 5.2 seconds, still a big
difference.

Conclusion: partitioning does not benefit us, and probably others, specially
when doing lots of joins and using parameterized queries.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Partitions-and-joins-lead-to-index-lookups-on-all-partitions-tp5055965p5074907.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after upgrade from 8.2 to 8.4

2011-12-14 Thread Kaloyan Iliev Iliev

Hi,
Thanks for Replay. Actually I finally find a solution. If I rewrite the 
query in this way:

explain analyze select 1
  from acc_clients AC,
   acc_debts AD,
   debts_desc DD,
   config CF
  where AC.ino = 204627 AND
CF.id = (select id
  from config
  where 
confid=CF.confid AND office = 18 ORDER BY archived_at DESC LIMIT 1) AND
AD.transact_no = 
AC.transact_no AND

AD.debtid = DD.debtid AND
DD.refid = CF.confid LIMIT 1;

the plan and execution time really approves.
http://explain.depesz.com/s/Nkj

And for comparison I will repost the old way the query was written.
explain analyze select 1
  from acc_clients AC,
   acc_debts AD,
   debts_desc DD,
   config CF
  where AC.ino = 120 AND
CF.id = (select id
  from config
  where 
confid=CF.confid ORDER BY archived_at DESC LIMIT 1) AND
AD.transact_no = 
AC.transact_no AND

AD.debtid = DD.debtid AND
 CF.office = 18 AND
DD.refid = CF.confid LIMIT 1;

This is the query plan of the upper query.
http://explain.depesz.com/s/ATN

When we have 8.4.9 installed I will try the query and post the result.

Best regards,
Kaloyan Iliev


Mark Kirkwood wrote:

On 10/12/11 04:30, Tom Lane wrote:
However, it's not apparent to me why you would see any difference 
between 8.2 and 8.4 on this type of query. I tried a query analogous 
to this one on both, and got identical plans. I'm guessing that your 
slowdown is due to not having updated statistics on the new 
installation, or perhaps failing to duplicate some relevant settings.


I notice he has 8.4.*8*... I wonder if he's running into the poor 
estimation bug for sub-selects/semi joins that was fixed in 8.4.9.


Kaloyan, can you try the query in 8.4.9?

regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?

2011-12-14 Thread Marti Raudsepp
2011/12/14 Rural Hunter :
> for example, the where condition is: where '' ~ col1. I created a normal
> index on col1 but seems it is not used.

I assume you want to search values that match one particular pattern,
that would be col1 ~ ''

The answer is, only very simple patterns that start with '^'. Note
that you MUST use the text_pattern_ops index opclass:

# create table words (word text);
# copy words from '/usr/share/dict/words';
# create index on words (word text_pattern_ops);
# explain select * from words where word ~ '^post';
Index Scan using words_word_idx on words  (cost=0.00..8.28 rows=10 width=9)
  Index Cond: ((word ~>=~ 'post'::text) AND (word ~<~ 'posu'::text))
  Filter: (word ~ '^post'::text)



If you just want to search for arbitrary strings, in PostgreSQL 9.1+
you can use pg_trgm extension with a LIKE expression:

# create extension pg_trgm;
# create index on words using gist (word gist_trgm_ops);
# explain select * from words where word like '%post%';
Bitmap Heap Scan on words  (cost=4.36..40.23 rows=10 width=9)
  Recheck Cond: (word ~~ '%post%'::text)
  ->  Bitmap Index Scan on words_word_idx1  (cost=0.00..4.36 rows=10 width=0)
Index Cond: (word ~~ '%post%'::text)



There's also the "wildspeed" external module which is somewhat faster
at this: http://www.sai.msu.su/~megera/wiki/wildspeed

And someone is working to get pg_trgm support for arbitrary regular
expression searches. This *may* become part of the next major
PostgreSQL release (9.2)
http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?

2011-12-14 Thread Rural Hunter
actually I stored the pattern in col1. I want to get the row whose col1 
pattern matches one string 'aaa'.


于2011年12月15日 4:43:37,Marti Raudsepp写到:

2011/12/14 Rural Hunter:

for example, the where condition is: where '' ~ col1. I created a normal
index on col1 but seems it is not used.


I assume you want to search values that match one particular pattern,
that would be col1 ~ ''

The answer is, only very simple patterns that start with '^'. Note
that you MUST use the text_pattern_ops index opclass:

# create table words (word text);
# copy words from '/usr/share/dict/words';
# create index on words (word text_pattern_ops);
# explain select * from words where word ~ '^post';
Index Scan using words_word_idx on words  (cost=0.00..8.28 rows=10 width=9)
   Index Cond: ((word ~>=~ 'post'::text) AND (word ~<~ 'posu'::text))
   Filter: (word ~ '^post'::text)



If you just want to search for arbitrary strings, in PostgreSQL 9.1+
you can use pg_trgm extension with a LIKE expression:

# create extension pg_trgm;
# create index on words using gist (word gist_trgm_ops);
# explain select * from words where word like '%post%';
Bitmap Heap Scan on words  (cost=4.36..40.23 rows=10 width=9)
   Recheck Cond: (word ~~ '%post%'::text)
   ->   Bitmap Index Scan on words_word_idx1  (cost=0.00..4.36 rows=10 width=0)
 Index Cond: (word ~~ '%post%'::text)



There's also the "wildspeed" external module which is somewhat faster
at this: http://www.sai.msu.su/~megera/wiki/wildspeed

And someone is working to get pg_trgm support for arbitrary regular
expression searches. This *may* become part of the next major
PostgreSQL release (9.2)
http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com

Regards,
Marti





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] copy vs. C function

2011-12-14 Thread Jon Nelson
On Wed, Dec 14, 2011 at 9:51 AM, Jon Nelson  wrote:
> On Wed, Dec 14, 2011 at 9:40 AM, Jon Nelson  wrote:
>> On Wed, Dec 14, 2011 at 9:25 AM, Tom Lane  wrote:
>>> Jon Nelson  writes:
 Regarding caching, I tried caching it across calls by making the
 TupleDesc static and only initializing it once.
 When I tried that, I got:
>>>
 ERROR:  number of columns (6769856) exceeds limit (1664)
>>>
 I tried to find some documentation or examples that cache the
 information, but couldn't find any.
>>>
>>> You might find reading record_in to be helpful.  What it caches is not
>>> exactly what you need to, I think, but it shows the general principles.
>>> There are lots of other functions that use fn_extra to cache info, too.
>>
>> I will definitely look into those. I'm probably doing it wrong, but in
>> the meantime, I allocated enough space (by way of MemoryContextAlloc)
>> in TopMemoryContext for an AttInMetadata pointer, switched to that
>> memory context (just the first time through), used CreateTupleDescCopy
>> + TupleDescGetAttInMetadata to duplicate (in the new memory context)
>> the TupleDesc, and then switched back. This approach seems to have
>> dropped the total run time to about 54 seconds, the bulk of which is
>> BuildTupleFromCStrings, a rather significant improvement.
>>
>> 
>>
>> Looking at record_in, I think I see what I could be doing better.
>
> Indeed. I revised the code to make use of fcinfo->flinfo->fn_extra for
> storage and fcinfo->flinfo->fn_mcxt for the MemoryContext and
> everything seemed to work just fine.
>
> Assuming one *starts* with a char *some_var[8], would building Datum
> myself be faster than using BuildTupleFromCStrings?

The answer is: yes. At least, in my case it is.
The total run time is now down to about 32 seconds.
Versus the BuildTupleFromCStrings which takes about 54 seconds.
32 seconds is more than 10-15 seconds, but it's livable.

This experiment has been very worthwhile - thank you all for the help.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance