Re: [GENERAL] tracking scripts...

2013-11-27 Thread Rémi Cura
First serious answer :
you don't have to use command line,
you can use the pgadmin gui, loading your file with all the command, and
then hit F6 (or select run as pgscript).
This will wrapp each command in a transaction , and will print messages all
along.
Please test this on a few line before trying on everything.

Second :
lol for the secret message.
There is a very famous one like this in french, in private letters by
George Sand, a famous writter.

The text is very high level and nice french, but if you read one line then
skip the next ..,
the message is very dirty !

It is of course also very hard to translate ...

Cheers,
Rémi-C

Cher ami,
Je suis toute émue de vous dire que j'ai
bien compris l'autre jour que vous aviez
toujours une envie folle de me faire
danser. Je garde le souvenir de votre
baiser et je voudrais bien que ce soit
une preuve que je puisse être aimée
par vous. Je suis prête à montrer mon
affection toute désintéressée et sans cal-
cul, et si vous voulez me voir ainsi
vous dévoiler, sans artifice, mon âme
toute nue, daignez me faire visite,
nous causerons et en amis franchement
je vous prouverai que je suis la femme
sincère, capable de vous offrir l'affection
la plus profonde, comme la plus étroite
amitié, en un mot : la meilleure épouse
dont vous puissiez rêver. Puisque votre
âme est libre, pensez que l'abandon ou je
vis est bien long, bien dur et souvent bien
insupportable. Mon chagrin est trop
gros. Accourrez bien vite et venez me le
faire oublier. À vous je veux me sou-
mettre entièrement.
Votre poupée


2013/11/26 Merlin Moncure mmonc...@gmail.com

 On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell r...@iol.ie wrote:
  On 26/11/2013 20:30, Merlin Moncure wrote:
  There are not many ways to Hand off information outside of the
  database while a transaction Is running. one way Is to write a Simple
  trigger in plpgsql that 'raise'es A notice every 'n' times trigger
  condition fires.  that'S Essentially the only Clean way to do it in
  such a way that the information is Returned to the Executing console.
  Thanks!
 
  Totally unrelated to the thread I noticed that the capitalised
  letters in the email above spell out this:
 
   THIISASECRET
 
  .. which (almost) spells This is a secret. Was this intentional, or am
  I just working too hard? :-)

 Well, bad spelling on my part.  To get the joke, you have to be A.
 observant, B. be using a gmail account, and C. be a comic book geek
 that grew up in the 80's.

 merlin


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] tracking scripts...

2013-11-27 Thread Albe Laurenz
John R Pierce wrote:
 On 11/26/2013 9:24 AM, Joey Quinn wrote:
 When I ran that command (select * from pg_stat_activity), it returned
 the first six lines of the scripts. I'm fairly sure it has gotten a
 bit beyond that (been running over 24 hours now, and the size has
 increased about 300 GB). Am I missing something for it to tell me what
 the last line processed was?
 
 that means your GUI lobbed the entire file at postgres in a single
 PQexec call, so its all being executed as a single statement.
 
 psql -f filename.sql dbname   would have processed the queries one at
 a time.

Yes, but that would slow down processing considerably, which would
not help in this case.

I'd opt for
psql -1 -f filename.sql dbname
so it all runs in a single transaction.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tracking scripts...

2013-11-27 Thread Rémi Cura
I'm not an expert,
 I would think if you can spare using only one transaction , it would be
way way faster to do it !

the system simply could skip keeping log to be ready to roll back for a 1
billion row update !

Of course it would be preferable to use psql to execute statement by
statement as separate transactions , and do it with X several parallel psql
(splitting the big text file into X parts), yet Joey seemed reluctant to
use console =)


Cheers,
Rémi-C


2013/11/27 Albe Laurenz laurenz.a...@wien.gv.at

 John R Pierce wrote:
  On 11/26/2013 9:24 AM, Joey Quinn wrote:
  When I ran that command (select * from pg_stat_activity), it returned
  the first six lines of the scripts. I'm fairly sure it has gotten a
  bit beyond that (been running over 24 hours now, and the size has
  increased about 300 GB). Am I missing something for it to tell me what
  the last line processed was?
 
  that means your GUI lobbed the entire file at postgres in a single
  PQexec call, so its all being executed as a single statement.
 
  psql -f filename.sql dbname   would have processed the queries one at
  a time.

 Yes, but that would slow down processing considerably, which would
 not help in this case.

 I'd opt for
 psql -1 -f filename.sql dbname
 so it all runs in a single transaction.

 Yours,
 Laurenz Albe

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] tracking scripts...

2013-11-27 Thread John R Pierce

On 11/27/2013 1:39 AM, Rémi Cura wrote:


the system simply could skip keeping log to be ready to roll back for 
a 1 billion row update


thats not how postgres does rollbacks



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tracking scripts...

2013-11-27 Thread Raymond O'Donnell
On 27/11/2013 08:20, Rémi Cura wrote:
 First serious answer :
 you don't have to use command line,
 you can use the pgadmin gui, loading your file with all the command, and
 then hit F6 (or select run as pgscript).
 This will wrapp each command in a transaction , and will print messages
 all along.
 Please test this on a few line before trying on everything.
 
 Second :
 lol for the secret message.
 There is a very famous one like this in french, in private letters by
 George Sand, a famous writter.
 
 The text is very high level and nice french, but if you read one line
 then skip the next ..,
 the message is very dirty !
 
 It is of course also very hard to translate ...

A guy I know finished finished his doctorate some years ago, more in
spite of his supervisor than with his assistance - the supervisor was
very little help, and at times actively discouraging. Anyway, when my
friend's thesis was printed, he included a short preface which thanked
the supervisor in glowing terms but the first letter of each line
spelt out the words F*** you Mike. :-)

Ray.


-- 
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] having difficulty with explain analyze output

2013-11-27 Thread David Rysdam
On Tue, 26 Nov 2013 14:51:22 -0500, Martijn van Oosterhout klep...@svana.org 
wrote:
 The Seq Scan took 674ms and was run once (loops=1)
 
 The Materialise was run 94951 times and took, on average, 0.011ms to
 return the first row and 16ms to complete.
 
 16.145 * 94951 = 1532983.895

OK, this is helpful. But why would Materialize run 94k times? I thought
the whole point of Materialize was to make a virtual table that could
speed up parent queries. If it has to recreate the subquery, why would
the planner choose to Materialize?


smime.p7s
Description: S/MIME cryptographic signature


[GENERAL] Complex sql, limit-for-each group by, arrays, updates

2013-11-27 Thread Dorian Hoxha
Hi,

So i have (table where data will be read) :
CREATE TABLE data (vid,cid,pid,number);

Tables where data will be writen/updated:

CREATE TABLE pid_top_vids (pid, vid[])
CREATE TABLE pid_top_cids (pid, cid[])
CREATE TABLE cid_top_vids (cid, vid[])

I need to , possibly in 1 query, this will run once in a while:
Get top(10) vids , sorted by 'number',grouped by pid and update the row in
TABLE(pid_top_vids).

Get top(10) vids, sorted by 'number', grouped by cid and update the row in
TABLE(cid_top_vids).

Get top(10) cids, sorted by 'number', where number is the SUM() of each vid
GROUP_BY(cid) and update the row in TABLE (cid_top_vids);

So, get data, create sorted array, and update the rows (they exist, so only
the arrays have to be updated).

Possible ?
Thanks


Re: [GENERAL] help interpreting explain analyze output

2013-11-27 Thread Vik Fearing
On 11/26/2013 06:24 PM, David Rysdam wrote:
 I'm not really looking for information on how to speed this query
 up. I'm just trying to interpret the output enough to tell me which step
 is slow:

Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual 
 time=3004851.889..3004851.889  rows=0  loops=1)
 Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
 SubPlan 1
   - Materialize  (cost=0.00..3713.93  rows=95862  width=4) 
 (actual time=0.011..16.145  rows=48139  loops=94951)
  - Seq Scan on sigs (cost=0.00..2906.62 rows=95862 
 width=4) (actual time=0.010..674.201  rows=95862  loops=1)
Total runtime: 3004852.005 ms

 It looks like the inner seq scan takes 674ms, then the materialize takes
 an additional 16ms? Or is that 16ms * 94951? Or 674 * 94951?

 And the outer seq scan takes 3004851-3004851 = 0ms?

The actual time shown is not a range.  The first number is how long it
took to produce the first row, and the second is how long it took
overall.  The Seq Scan on mags took 3004851.889ms to produce the first
(non-existent) row and then it was finished so that's also the overall time.

SubPlan 1 took a total of 674.201ms + 94951 * 16.145ms = 1533658.096ms
which means the Filter line actually took 1471193.793ms to determine
there were no matching rows.

That is, if I understood it correctly myself.

-- 
Vik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Any advantage of using SSL with a certificate of authority?

2013-11-27 Thread Vick Khera
On Tue, Nov 26, 2013 at 4:48 PM, Bruce Momjian br...@momjian.us wrote:

 Right. I know of no mechanism to verify a certificate via a public CA
 through SSL.  Browsers have a list of trusted certificates, but SSL
 alone doesn't, as far as I know.


SSL as a library/protocol has mechanisms to verify the certificates. The
list of certificates that are trusted is left to the implementation. Most
people install the mozilla cert bundle and use that.


Re: [GENERAL] tracking scripts...

2013-11-27 Thread Joey Quinn
Wow, thank-you (sometimes the answer is right there in front of you... very
new to Postgres, had wondered what the difference was between the run query
and run as PGS script, but hadn't looked into it yet).

So, here's the critical question(s) right now (for me)...

With the way I launched it, using the Execute query button, if I now hit
the Cancel query button, what happens? Have the last two days of updates
already been committed? Or will they get rolled back? I would love to
switch to the other method, so that I can gauge progress, but would hate to
lose two days worth of run time...

 If I do run the same script (a bit over 100k lines) in PGS mode, will that
affect the speed? If so, how much? 1%? 5%? More?




On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura remi.c...@gmail.com wrote:

 First serious answer :
 you don't have to use command line,
 you can use the pgadmin gui, loading your file with all the command, and
 then hit F6 (or select run as pgscript).
 This will wrapp each command in a transaction , and will print messages
 all along.
 Please test this on a few line before trying on everything.

 Second :
 lol for the secret message.
 There is a very famous one like this in french, in private letters by
 George Sand, a famous writter.

 The text is very high level and nice french, but if you read one line then
 skip the next ..,
 the message is very dirty !

 It is of course also very hard to translate ...

 Cheers,
 Rémi-C

 Cher ami,
 Je suis toute émue de vous dire que j'ai
 bien compris l'autre jour que vous aviez
 toujours une envie folle de me faire
 danser. Je garde le souvenir de votre
 baiser et je voudrais bien que ce soit
 une preuve que je puisse être aimée
 par vous. Je suis prête à montrer mon
 affection toute désintéressée et sans cal-
 cul, et si vous voulez me voir ainsi
 vous dévoiler, sans artifice, mon âme
 toute nue, daignez me faire visite,
 nous causerons et en amis franchement
 je vous prouverai que je suis la femme
 sincère, capable de vous offrir l'affection
 la plus profonde, comme la plus étroite
 amitié, en un mot : la meilleure épouse
 dont vous puissiez rêver. Puisque votre
 âme est libre, pensez que l'abandon ou je
 vis est bien long, bien dur et souvent bien
 insupportable. Mon chagrin est trop
 gros. Accourrez bien vite et venez me le
 faire oublier. À vous je veux me sou-
 mettre entièrement.
 Votre poupée


 2013/11/26 Merlin Moncure mmonc...@gmail.com

 On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell r...@iol.ie wrote:
  On 26/11/2013 20:30, Merlin Moncure wrote:
  There are not many ways to Hand off information outside of the
  database while a transaction Is running. one way Is to write a Simple
  trigger in plpgsql that 'raise'es A notice every 'n' times trigger
  condition fires.  that'S Essentially the only Clean way to do it in
  such a way that the information is Returned to the Executing console.
  Thanks!
 
  Totally unrelated to the thread I noticed that the capitalised
  letters in the email above spell out this:
 
   THIISASECRET
 
  .. which (almost) spells This is a secret. Was this intentional, or am
  I just working too hard? :-)

 Well, bad spelling on my part.  To get the joke, you have to be A.
 observant, B. be using a gmail account, and C. be a comic book geek
 that grew up in the 80's.

 merlin


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general





Re: [GENERAL] tracking scripts...

2013-11-27 Thread Rémi Cura
Sorry,
if you cancel everything will be rolled back
(it is actually what makes DB so powerfull).

Unless it finishes I don't know of a way to keep changes.

At least on my computer (I don't know if you can generalize this), it is
way faster to split into many transaction, so you would gain time.
Using pgscript will make you loose some time, but it won't be much if each
querry is long enough (some seconds at least).
If you intend to do it often, you may want to consider mutliple parallel
psql.

Cheers,

Rémi-C



2013/11/27 Joey Quinn bjquinn...@gmail.com

 Wow, thank-you (sometimes the answer is right there in front of you...
 very new to Postgres, had wondered what the difference was between the run
 query and run as PGS script, but hadn't looked into it yet).

 So, here's the critical question(s) right now (for me)...

 With the way I launched it, using the Execute query button, if I now hit
 the Cancel query button, what happens? Have the last two days of updates
 already been committed? Or will they get rolled back? I would love to
 switch to the other method, so that I can gauge progress, but would hate to
 lose two days worth of run time...

  If I do run the same script (a bit over 100k lines) in PGS mode, will
 that affect the speed? If so, how much? 1%? 5%? More?




 On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura remi.c...@gmail.com wrote:

 First serious answer :
 you don't have to use command line,
 you can use the pgadmin gui, loading your file with all the command, and
 then hit F6 (or select run as pgscript).
 This will wrapp each command in a transaction , and will print messages
 all along.
 Please test this on a few line before trying on everything.

 Second :
 lol for the secret message.
 There is a very famous one like this in french, in private letters by
 George Sand, a famous writter.

 The text is very high level and nice french, but if you read one line
 then skip the next ..,
 the message is very dirty !

 It is of course also very hard to translate ...

 Cheers,
 Rémi-C

 Cher ami,
 Je suis toute émue de vous dire que j'ai
 bien compris l'autre jour que vous aviez
 toujours une envie folle de me faire
 danser. Je garde le souvenir de votre
 baiser et je voudrais bien que ce soit
 une preuve que je puisse être aimée
 par vous. Je suis prête à montrer mon
 affection toute désintéressée et sans cal-
 cul, et si vous voulez me voir ainsi
 vous dévoiler, sans artifice, mon âme
 toute nue, daignez me faire visite,
 nous causerons et en amis franchement
 je vous prouverai que je suis la femme
 sincère, capable de vous offrir l'affection
 la plus profonde, comme la plus étroite
 amitié, en un mot : la meilleure épouse
 dont vous puissiez rêver. Puisque votre
 âme est libre, pensez que l'abandon ou je
 vis est bien long, bien dur et souvent bien
 insupportable. Mon chagrin est trop
 gros. Accourrez bien vite et venez me le
 faire oublier. À vous je veux me sou-
 mettre entièrement.
 Votre poupée


 2013/11/26 Merlin Moncure mmonc...@gmail.com

  On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell r...@iol.ie wrote:
  On 26/11/2013 20:30, Merlin Moncure wrote:
  There are not many ways to Hand off information outside of the
  database while a transaction Is running. one way Is to write a Simple
  trigger in plpgsql that 'raise'es A notice every 'n' times trigger
  condition fires.  that'S Essentially the only Clean way to do it in
  such a way that the information is Returned to the Executing console.
  Thanks!
 
  Totally unrelated to the thread I noticed that the capitalised
  letters in the email above spell out this:
 
   THIISASECRET
 
  .. which (almost) spells This is a secret. Was this intentional, or
 am
  I just working too hard? :-)

 Well, bad spelling on my part.  To get the joke, you have to be A.
 observant, B. be using a gmail account, and C. be a comic book geek
 that grew up in the 80's.

 merlin


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general






Re: [GENERAL] tracking scripts...

2013-11-27 Thread Joey Quinn
A little reluctant, yes, but not 100%. I'm new to Postgres, but if I end up
using it enough, then I will also end up learning some command line stuff.
If it continues to look like a good/robust solution for this particular
project (think ERIPP http://www.eripp.com plus
Shodanhttp://www.shodanhq.com/ and
whatever else I can come up with) then I'll be here awhile...

Joey



On Wed, Nov 27, 2013 at 4:39 AM, Rémi Cura remi.c...@gmail.com wrote:

 I'm not an expert,
  I would think if you can spare using only one transaction , it would be
 way way faster to do it !

 the system simply could skip keeping log to be ready to roll back for a 1
 billion row update !

 Of course it would be preferable to use psql to execute statement by
 statement as separate transactions , and do it with X several parallel psql
 (splitting the big text file into X parts), yet Joey seemed reluctant to
 use console =)


 Cheers,
 Rémi-C


 2013/11/27 Albe Laurenz laurenz.a...@wien.gv.at

 John R Pierce wrote:
  On 11/26/2013 9:24 AM, Joey Quinn wrote:
  When I ran that command (select * from pg_stat_activity), it returned
  the first six lines of the scripts. I'm fairly sure it has gotten a
  bit beyond that (been running over 24 hours now, and the size has
  increased about 300 GB). Am I missing something for it to tell me what
  the last line processed was?
 
  that means your GUI lobbed the entire file at postgres in a single
  PQexec call, so its all being executed as a single statement.
 
  psql -f filename.sql dbname   would have processed the queries one at
  a time.

 Yes, but that would slow down processing considerably, which would
 not help in this case.

 I'd opt for
 psql -1 -f filename.sql dbname
 so it all runs in a single transaction.

 Yours,
 Laurenz Albe

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general





Re: [GENERAL] tracking scripts...

2013-11-27 Thread Joey Quinn
So, no cancel... keeping my fingers crossed (rolling up on 170,000,000 ms -
a bit over 47 hours - data folder size now at 1.11 TB).

Fortunately, I'm pretty sure this will be my largest batch update (since
the info is static, and available all at once, I was able to generate the
complete script. In most cases, I will be retrieving data and that takes a
bit longer).

When I end up having to rebuild the entire thing on a separate system, can
I divide the script up and launch it piecemeal in differ PGAdmin query
windows for the parallel action? Or is there a particular command I have to
give it?



On Wed, Nov 27, 2013 at 9:23 AM, Rémi Cura remi.c...@gmail.com wrote:

 Sorry,
 if you cancel everything will be rolled back
 (it is actually what makes DB so powerfull).

 Unless it finishes I don't know of a way to keep changes.

 At least on my computer (I don't know if you can generalize this), it is
 way faster to split into many transaction, so you would gain time.
 Using pgscript will make you loose some time, but it won't be much if each
 querry is long enough (some seconds at least).
 If you intend to do it often, you may want to consider mutliple parallel
 psql.

 Cheers,

 Rémi-C



 2013/11/27 Joey Quinn bjquinn...@gmail.com

 Wow, thank-you (sometimes the answer is right there in front of you...
 very new to Postgres, had wondered what the difference was between the run
 query and run as PGS script, but hadn't looked into it yet).

 So, here's the critical question(s) right now (for me)...

 With the way I launched it, using the Execute query button, if I now
 hit the Cancel query button, what happens? Have the last two days of
 updates already been committed? Or will they get rolled back? I would love
 to switch to the other method, so that I can gauge progress, but would hate
 to lose two days worth of run time...

  If I do run the same script (a bit over 100k lines) in PGS mode, will
 that affect the speed? If so, how much? 1%? 5%? More?




 On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura remi.c...@gmail.com wrote:

 First serious answer :
 you don't have to use command line,
 you can use the pgadmin gui, loading your file with all the command, and
 then hit F6 (or select run as pgscript).
 This will wrapp each command in a transaction , and will print messages
 all along.
 Please test this on a few line before trying on everything.

 Second :
 lol for the secret message.
 There is a very famous one like this in french, in private letters by
 George Sand, a famous writter.

 The text is very high level and nice french, but if you read one line
 then skip the next ..,
 the message is very dirty !

 It is of course also very hard to translate ...

 Cheers,
 Rémi-C

 Cher ami,
 Je suis toute émue de vous dire que j'ai
 bien compris l'autre jour que vous aviez
 toujours une envie folle de me faire
 danser. Je garde le souvenir de votre
 baiser et je voudrais bien que ce soit
 une preuve que je puisse être aimée
 par vous. Je suis prête à montrer mon
 affection toute désintéressée et sans cal-
 cul, et si vous voulez me voir ainsi
 vous dévoiler, sans artifice, mon âme
 toute nue, daignez me faire visite,
 nous causerons et en amis franchement
 je vous prouverai que je suis la femme
 sincère, capable de vous offrir l'affection
 la plus profonde, comme la plus étroite
 amitié, en un mot : la meilleure épouse
 dont vous puissiez rêver. Puisque votre
 âme est libre, pensez que l'abandon ou je
 vis est bien long, bien dur et souvent bien
 insupportable. Mon chagrin est trop
 gros. Accourrez bien vite et venez me le
 faire oublier. À vous je veux me sou-
 mettre entièrement.
 Votre poupée


 2013/11/26 Merlin Moncure mmonc...@gmail.com

  On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell r...@iol.ie wrote:
  On 26/11/2013 20:30, Merlin Moncure wrote:
  There are not many ways to Hand off information outside of the
  database while a transaction Is running. one way Is to write a Simple
  trigger in plpgsql that 'raise'es A notice every 'n' times trigger
  condition fires.  that'S Essentially the only Clean way to do it in
  such a way that the information is Returned to the Executing console.
  Thanks!
 
  Totally unrelated to the thread I noticed that the capitalised
  letters in the email above spell out this:
 
   THIISASECRET
 
  .. which (almost) spells This is a secret. Was this intentional, or
 am
  I just working too hard? :-)

 Well, bad spelling on my part.  To get the joke, you have to be A.
 observant, B. be using a gmail account, and C. be a comic book geek
 that grew up in the 80's.

 merlin


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general







[GENERAL] Documentation of C functions

2013-11-27 Thread Janek Sendrowski
Hi,
Is there a documentation of postgresql's C functions like SET_VARSIZE for 
exmaple?
 
Janek Sendrowski


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tracking scripts...

2013-11-27 Thread Merlin Moncure
On Wed, Nov 27, 2013 at 8:35 AM, Joey Quinn bjquinn...@gmail.com wrote:
 So, no cancel... keeping my fingers crossed (rolling up on 170,000,000 ms -
 a bit over 47 hours - data folder size now at 1.11 TB).

 Fortunately, I'm pretty sure this will be my largest batch update (since the
 info is static, and available all at once, I was able to generate the
 complete script. In most cases, I will be retrieving data and that takes a
 bit longer).

 When I end up having to rebuild the entire thing on a separate system, can I
 divide the script up and launch it piecemeal in differ PGAdmin query windows
 for the parallel action? Or is there a particular command I have to give it?

For very large updates on mostly static data it may be better to
SELECT the data into a new table then swap it in when done.   MY rule
of thumb is that updates are 10x more expensive than inserts,
particularly in terms of large operations.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tracking scripts...

2013-11-27 Thread Joey Quinn
In this case, I'm updating one column. Wouldn't the swap part of that
still have to be an update?


On Wed, Nov 27, 2013 at 9:50 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Nov 27, 2013 at 8:35 AM, Joey Quinn bjquinn...@gmail.com wrote:
  So, no cancel... keeping my fingers crossed (rolling up on 170,000,000
 ms -
  a bit over 47 hours - data folder size now at 1.11 TB).
 
  Fortunately, I'm pretty sure this will be my largest batch update (since
 the
  info is static, and available all at once, I was able to generate the
  complete script. In most cases, I will be retrieving data and that takes
 a
  bit longer).
 
  When I end up having to rebuild the entire thing on a separate system,
 can I
  divide the script up and launch it piecemeal in differ PGAdmin query
 windows
  for the parallel action? Or is there a particular command I have to give
 it?

 For very large updates on mostly static data it may be better to
 SELECT the data into a new table then swap it in when done.   MY rule
 of thumb is that updates are 10x more expensive than inserts,
 particularly in terms of large operations.

 merlin



Re: [GENERAL] Documentation of C functions

2013-11-27 Thread Albe Laurenz
Janek Sendrowski wrote:
 Is there a documentation of postgresql's C functions like SET_VARSIZE for 
 exmaple?

For things like this consult the source code.

In src/include/postgres.h you'll find:

/*
 * VARDATA, VARSIZE, and SET_VARSIZE are the recommended API for most code
 * for varlena datatypes.  Note that they only work on untoasted,
 * 4-byte-header Datums!
 *
 * Code that wants to use 1-byte-header values without detoasting should
 * use VARSIZE_ANY/VARSIZE_ANY_EXHDR/VARDATA_ANY.  The other macros here
 * should usually be used only by tuple assembly/disassembly code and
 * code that specifically wants to work with still-toasted Datums.
 *
 * WARNING: It is only safe to use VARDATA_ANY() -- typically with
 * PG_DETOAST_DATUM_PACKED() -- if you really don't care about the alignment.
 * Either because you're working with something like text where the alignment
 * doesn't matter or because you're not going to access its constituent parts
 * and just use things like memcpy on it anyways.
 */

The server side C API is not documented in the documentation, because
it is already (hopefully) well documented in the source.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tracking scripts...

2013-11-27 Thread Merlin Moncure
On Wed, Nov 27, 2013 at 9:00 AM, Joey Quinn bjquinn...@gmail.com wrote:
 On Wed, Nov 27, 2013 at 9:50 AM, Merlin Moncure mmonc...@gmail.com wrote:
 For very large updates on mostly static data it may be better to
 SELECT the data into a new table then swap it in when done.   MY rule
 of thumb is that updates are 10x more expensive than inserts,
 particularly in terms of large operations.

 In this case, I'm updating one column. Wouldn't the swap part of that
 still have to be an update?


nope.  the basic mechanism is to:

BEGIN;
CREATE TABLE scratch (LIKE foo INCLUDING ALL);
INSERT INTO scratch SELECT ... FROM foo ...;
ALTER TABLE foo RENAME TO backup;
ALTER TABLE scratch RENAME TO foo;
COMMIT;

The main pain point is that you will have to recreate and table
dependent structures: views, triggers, etc.  this is generally trivial
if you properly keep your schema definitions in scripts and a big
headache otherwise.

You will probably try to avoid updates to 'foo' while the swap is
happening to keep things simple.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam
I've got two tables, sigs and mags. It's a one-to-one relationship, mags
is just split out because we store a big, less-often-used field
there. signum is the key field.

Sometimes I want to know if I have any orphans in mags, so I do a query
like this:

select signum from lp.Mags where signum is not null and signum not
in (select lp.Sigs.signum from lp.Sigs)

(I do this as a subquery because we originally had a old Sybase DB where
outer joins were a non-standard pain and this way works the same and is
DBMS-agnostic.)

At my location, this query runs very fast (~50ms on a ~100k row table)
and 'explain' shows a plan with this structure:

Seq scan on mags
Filter:
SubPlan 1
Seq scan on sigs

At my client's location, the query is very slow (same table size,
similar hardware/config, although they are running 9.0.x and I'm on
9.2.x). By slow I mean like an *hour*. 'explain' has this structure:

Seq scan on mags
Filter:
SubPlan 1
Materialize
Seq scan on sigs

I'd never heard of Materialize before, so I looked into it. Seems to
make a virtual table of the subquery so repetitions of the parent query
don't have to re-do the work. Sounds like it should only help, right?

The client's 'explain analyze' shows this:

   Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual 
time=3004851.889..3004851.889  rows=0  loops=1)
  Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
  SubPlan 1
- Materialize  (cost=0.00..3713.93  rows=95862  width=4) 
(actual time=0.011..16.145  rows=48139  loops=94951)
   - Seq Scan on sigs (cost=0.00..2906.62 rows=95862 
width=4) (actual time=0.010..674.201  rows=95862  loops=1)
   Total runtime: 3004852.005 ms

If I'm reading this correctly, the Materialize is running ~95k times,
taking the majority of the time. Why? The only thing I can think of is
this scenario:

1) server thinks it has a LOT of RAM
2) decides to Materialize subquery to take advantage
3) machine does not actually have that RAM, so it gets swapped
4) server notices it was swapped and decides to re-run rather than
unswap
5) goto 2

I don't know if that's a realistic scenario, but it's all I got. I'm
already well into unknown territory, performance-tuning-wise.

I also decided to try doing the query a different way:

 select lp.mags.signum from lp.mags left join lp.sigs on
 lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
 and lp.sigs.signum is null;

This one runs fast for both of us. So I guess my second question is: why
can't the query planner tell these are the same query? 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] having difficulty with explain analyze output

2013-11-27 Thread Tom Lane
David Rysdam drys...@ll.mit.edu writes:
 On Tue, 26 Nov 2013 14:51:22 -0500, Martijn van Oosterhout 
 klep...@svana.org wrote:
 The Seq Scan took 674ms and was run once (loops=1)
 
 The Materialise was run 94951 times and took, on average, 0.011ms to
 return the first row and 16ms to complete.
 
 16.145 * 94951 = 1532983.895

 OK, this is helpful. But why would Materialize run 94k times? I thought
 the whole point of Materialize was to make a virtual table that could
 speed up parent queries. If it has to recreate the subquery, why would
 the planner choose to Materialize?

It *does* create a virtual table, or tuplestore --- what you are seeing
here is just the cost of pulling tuples out of that tuplestore.  There
were 48139 * (94951 - 1) = 4570798050 tuples returned out of the
tuplestore rather than by directly querying the underlying seqscan, and
the time spent doing that was 16.145 * 94951 - 674.201 = 1532309.694 msec,
so on your machine it takes about a third of a microsecond to pull a tuple
from the in-memory tuplestore, which sounds about right allowing for
EXPLAIN ANALYZE's instrumentation overhead.  Without the Materialize
node it'd have had to rescan the sigs table for each outer row, and
you'd probably still be waiting :-(

BTW, it's possible for a tuplestore to spill to disk, negating some of the
benefit, if the storage required exceeds work_mem.  If you have a small
work_mem setting, does it go any faster if you increase work_mem?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread bricklen
On Wed, Nov 27, 2013 at 7:56 AM, David Rysdam drys...@ll.mit.edu wrote:



 At my client's location, the query is very slow (same table size,
 similar hardware/config, although they are running 9.0.x and I'm on
 9.2.x). By slow I mean like an *hour*. 'explain' has this structure:

 Seq scan on mags
 Filter:
 SubPlan 1
 Materialize
 Seq scan on sigs

 I'd never heard of Materialize before, so I looked into it. Seems to
 make a virtual table of the subquery so repetitions of the parent query
 don't have to re-do the work. Sounds like it should only help, right?

 The client's 'explain analyze' shows this:

Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4)
 (actual time=3004851.889..3004851.889  rows=0  loops=1)
   Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
   SubPlan 1
 - Materialize  (cost=0.00..3713.93  rows=95862  width=4)
 (actual time=0.011..16.145  rows=48139  loops=94951)
- Seq Scan on sigs (cost=0.00..2906.62
 rows=95862 width=4) (actual time=0.010..674.201  rows=95862  loops=1)
Total runtime: 3004852.005 ms



Has the client ANALYZEd recently? What happens if the client issues the
following commands before executing the query?
VACUUM ANALYZE lp.sigs;
VACUUM ANALYZE lp.mags;

If that doesn't change the plan, could you post the values for
effective_cache_size, shared_buffers, random_page_cost, cpu_tuple_cost,
work_mem and how much RAM is in the client machine?


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread Vik Fearing
On 11/27/2013 04:56 PM, David Rysdam wrote:
 I've got two tables, sigs and mags. It's a one-to-one relationship, mags
 is just split out because we store a big, less-often-used field
 there. signum is the key field.

 Sometimes I want to know if I have any orphans in mags, so I do a query
 like this:

 select signum from lp.Mags where signum is not null and signum not
 in (select lp.Sigs.signum from lp.Sigs)

[...]

 I also decided to try doing the query a different way:

select lp.mags.signum from lp.mags left join lp.sigs on
  lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
  and lp.sigs.signum is null;

 This one runs fast for both of us. So I guess my second question is: why
 can't the query planner tell these are the same query? 

Because they're not the same query.  NOT IN has a bunch of semantics
issues regarding nulls which the anti-join in the second query does not
have.

-- 
Vik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread Tom Lane
David Rysdam drys...@ll.mit.edu writes:
 Sometimes I want to know if I have any orphans in mags, so I do a query
 like this:

 select signum from lp.Mags where signum is not null and signum not
 in (select lp.Sigs.signum from lp.Sigs)

 (I do this as a subquery because we originally had a old Sybase DB where
 outer joins were a non-standard pain and this way works the same and is
 DBMS-agnostic.)

DBMS-agnostic frequently means sucks equally badly on all DBMSes :-(
This query is hard to optimize because of the weird behavior of NOT IN
when nulls are involved.  Since you aren't complaining that the query
fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
but the planner doesn't know that.  If you can transform it to a NOT
EXISTS, you'll likely get a much better plan:

select signum from lp.Mags where signum is not null and
not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)

What you want is an anti join plan, or at least a plan that mentions
a hashed subplan.  Plain subplans are death performance-wise, because
they amount to being nestloop joins rather than anything smarter.  (In
this case it's likely not choosing a hashed subplan because work_mem is
too small to allow that.)

 I also decided to try doing the query a different way:

select lp.mags.signum from lp.mags left join lp.sigs on
  lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
  and lp.sigs.signum is null;

That's another way to get an anti-join (at least on recent PGs, I forget
if 9.0 recognizes it).

 This one runs fast for both of us. So I guess my second question is: why
 can't the query planner tell these are the same query? 

They aren't.  See comment about behavior with NULLs.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Johnston
David Rysdam wrote
 I'd never heard of Materialize before, so I looked into it. Seems to
 make a virtual table of the subquery so repetitions of the parent query
 don't have to re-do the work. Sounds like it should only help, right?

Forgive any inaccuracies but I'm pretty sure about the following:

Materialize is this sense means what you need doesn't fit in memory (likely
work-mem setting) and needs to be saved to disk and streamed from there.
Since IO is expensive this kills.  The virtual table concept is mostly
implemented by hash (tables) and not materialize.

The materialize is only running once and creating a 95k record table, then
scanning that table 95k times to locate a potential match for each input
row.  Since materialize does not index it has to sequential scan which takes
forever.


The other question, why the difference, is that IN has to accomodate NULLs
in the lookup table; join does not.  neither does EXISTS.  If you can
replace the NOT IN with NOT EXISTS and write a correlated sub-query you
should get the same plan as the LEFT JOIN version, IIRC.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/nested-query-vs-left-join-query-planner-very-confused-tp5780585p5780596.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam
On Wed, 27 Nov 2013 11:21:09 -0500, Tom Lane t...@sss.pgh.pa.us wrote:
 DBMS-agnostic frequently means sucks equally badly on all DBMSes :-(

We've generally been OK (cf the ~50ms runtime for the same query at our
site), but we also notice problems sooner than our client sometimes does
and can make algorithm improvements where we don't know how to make DB
ones.

 This query is hard to optimize because of the weird behavior of NOT IN
 when nulls are involved.  Since you aren't complaining that the query
 fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
 but the planner doesn't know that.  If you can transform it to a NOT
 EXISTS, you'll likely get a much better plan:
 
 select signum from lp.Mags where signum is not null and
 not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)

We've already shipped to the client, but I'm looking at how extensive a
patch would have to be. Very surprising we haven't hit this issue
before.

 What you want is an anti join plan, or at least a plan that mentions
 a hashed subplan.  Plain subplans are death performance-wise, because
 they amount to being nestloop joins rather than anything smarter.  (In
 this case it's likely not choosing a hashed subplan because work_mem is
 too small to allow that.)

I've got the client looking for this and other values already. We'll
soon know...


smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam
On Wed, 27 Nov 2013 11:06:51 -0500, bricklen brick...@gmail.com wrote:
 Has the client ANALYZEd recently? What happens if the client issues
 the following commands before executing the query?
 VACUUM ANALYZE lp.sigs;
 VACUUM ANALYZE lp.mags;

 If that doesn't change the plan, could you post the values for
 effective_cache_size, shared_buffers, random_page_cost,
 cpu_tuple_cost, work_mem and how much RAM is in the client machine?

Yes, I did have them do a vacuum analyze with no result. Here's their
reply on configuration:

   It is 24 Gig.

   effective_cache_size - 12000MB
   shared_buffers - 1024MB
   random_page_cost - is commented out
   cpu_tuple_cost -  commented out
   work_mem - commented out

I assume you guys already know the default values for those last 3 on a
9.0.x server...


smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] Complex sql, limit-for-each group by, arrays, updates

2013-11-27 Thread David Johnston
Dorian Hoxha wrote
 Hi,
 
 So i have (table where data will be read) :
 CREATE TABLE data (vid,cid,pid,number);
 
 Tables where data will be writen/updated:
 
 CREATE TABLE pid_top_vids (pid, vid[])
 CREATE TABLE pid_top_cids (pid, cid[])
 CREATE TABLE cid_top_vids (cid, vid[])
 
 I need to , possibly in 1 query, this will run once in a while:
 Get top(10) vids , sorted by 'number',grouped by pid and update the row in
 TABLE(pid_top_vids).
 
 Get top(10) vids, sorted by 'number', grouped by cid and update the row in
 TABLE(cid_top_vids).
 
 Get top(10) cids, sorted by 'number', where number is the SUM() of each
 vid
 GROUP_BY(cid) and update the row in TABLE (cid_top_vids);
 
 So, get data, create sorted array, and update the rows (they exist, so
 only
 the arrays have to be updated).
 
 Possible ?
 Thanks

I would not bother trying to do it in a single SQL statement but each query
is possible, and fairly basic, using UPDATE with a FROM clause containing
the desired sub-query.

David J.
 




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Complex-sql-limit-for-each-group-by-arrays-updates-tp5780554p5780599.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] help interpreting explain analyze output

2013-11-27 Thread hubert depesz lubaczewski
On Tue, Nov 26, 2013 at 12:24:08PM -0500, David Rysdam wrote:
 I'm not really looking for information on how to speed this query
 up. I'm just trying to interpret the output enough to tell me which step
 is slow:

You might want to read this:
http://www.depesz.com/tag/unexplainable/

Best regards,

depesz



signature.asc
Description: Digital signature


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread Tom Lane
David Rysdam drys...@ll.mit.edu writes:
  effective_cache_size - 12000MB
  shared_buffers - 1024MB
  random_page_cost - is commented out
  cpu_tuple_cost -  commented out
  work_mem - commented out

 I assume you guys already know the default values for those last 3 on a
 9.0.x server...

Default work_mem is only 1MB, so that probably explains why you're not
getting a hashed subplan here.  Have them knock it up some, say on the
order of 10MB.  (If none of your queries are any more complicated than
this one, you could go higher.  But keep in mind that a backend can use
work_mem per sort/hash/materialize step, not per query --- so complex
queries can use many times work_mem.  Multiply that by the number of
backends, and you can end up in swap hell pretty quickly with an over
optimistic value.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam
On Wed, 27 Nov 2013 13:02:20 -0500, Tom Lane t...@sss.pgh.pa.us wrote:
 David Rysdam drys...@ll.mit.edu writes:
 effective_cache_size - 12000MB
 shared_buffers - 1024MB
 random_page_cost - is commented out
 cpu_tuple_cost -  commented out
 work_mem - commented out
 
  I assume you guys already know the default values for those last 3 on a
  9.0.x server...
 
 Default work_mem is only 1MB, so that probably explains why you're not
 getting a hashed subplan here.  Have them knock it up some, say on the
 order of 10MB.  (If none of your queries are any more complicated than
 this one, you could go higher.  But keep in mind that a backend can use
 work_mem per sort/hash/materialize step, not per query --- so complex
 queries can use many times work_mem.  Multiply that by the number of
 backends, and you can end up in swap hell pretty quickly with an over
 optimistic value.)

We deliberately try to keep our queries fairly simple for several
reasons. This isn't the most complicated, but they don't get much more
than this. I'll have them start with 10MB and see what they get.


smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam
On Wed, 27 Nov 2013 13:04:54 -0500, David Rysdam drys...@ll.mit.edu wrote:
 We deliberately try to keep our queries fairly simple for several
 reasons. This isn't the most complicated, but they don't get much more
 than this. I'll have them start with 10MB and see what they get.

10MB was enough to get that query to come back instantly. The same query
on some larger tables were still slow so we ended up bumping up to 50MB
to get the entire job done. That probably sounds like a lot to you guys,
but now that we know the behavior and what kind of queries we have I
think we're OK.

(And if you want a shocker, when I looked at our own DB, we've had
work_mem set to 250MB on a lot of our servers and 1GB on our main
production machine. Heh. I've got some PG tuning books here next to me
now...)

Thanks!


smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of size ???

2013-11-27 Thread Tomas Vondra
On 27 Listopad 2013, 22:39, Brian Wong wrote:
 Date: Fri, 22 Nov 2013 20:11:47 +0100
 Subject: Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of
 size ???
 From: t...@fuzzy.cz
 To: bwon...@hotmail.com
 CC: brick...@gmail.com; pgsql-general@postgresql.org

 On 19 Listopad 2013, 5:30, Brian Wong wrote:
  I've tried any work_mem value from 1gb all the way up to 40gb, with no
  effect on the error.  I'd like to think of this problem as a server
  process memory (not the server's buffers) or client process memory
 issue,
  primarily because when we tested the error there was no other load
  whatsoever.  Unfortunately,  the error doesn't say what kinda memory
 ran
  out.

 Hi Brian,

 first of all, please don't top-post, especially if the previous response
 used bottom-post. Thank you ;-)

 Regarding the issue you're seeing:

 * Increasing work_mem in hope that it will make the issue go away is
 pointless. In case work_mem is too low, PostgreSQL will automatically
 spill the data to disk (e.g. it won't do a sort in memory, but will do a
 on-disk merge sort). It will never fail, and messages failed on request
 of size is actually coming from malloc, when requesting another chunk
 of
 memory from the OS. So you're hitting a OS-level memory limit.

 After changing the shared_buffers setting to 200MB, the developer has
 confirmed that the Out Of Memory error no longer happens.  So thanks
 folks.

 Playing with work_mem was out of desperation.  Postgresql simply giving
 the Out of memory error wasn't informative enough about the problem.
 For example, is it the server buffer, the server process, or the client
 process that's having a problem?

PostgreSQL is unable to give you a more detailed information about the
cause because it simply does not have it. You're hitting some limit set at
the kernel level, so PostgreSQL calls malloc() and kernel responds with
NULL. What details would you expect from PostgresQL in that case than
simple statement 'we tried to allocated X bytes and it failed'?

Also, this kind of errors tends to be 'random' i.e. it the allocation
error may happen at different places every time you run the query,
depending on how many backends are running etc.

The fact that decreasing shared buffers to 200 MB made the problem go away
for now only suggests this is a problem with some kernel limit (ulimit,
overcommit, ...). It also means you haven't solved it and it will likely
happen again in the future, e.g. if the amount of data grows and you'll
hit the limit again. Find and fix the actual issue.

Have you checked the ulimit / overcommit as I asked you to?

 Note: AFAIK the only operation that does not spill to disk, and may fail
 with OOM-like errors is hash aggregate. But by increasing the work_mem
 you're actually encouraging PostgreSQL to do this planning error.

 I see the query you're running is doing MAX() so it might be hitting
 this
 issue. How much data are you dealing with? How many groups are in the
 result?

 * Setting shared buffers to 18GB is almost certainly a bad choice. I'm
 yet
 to see a real-world database where shared_buffers over 8GB actually make
 a
 measurable difference. More is not always better, and you're actually
 reserving memory that can't be used for work_mem (so you're making the
 issue worse). Please, decrease shared_buffers to e.g. 4GB, then try to
 increase it and measure the performance difference.

 If you look at the documentation about how to configure shared_buffers, it
 is very unclear to me how I can maximize performance by allocating as much
 memory to the buffer as possible.  On one hand, the documentation says I
 shouldn't go high on the shared_buffers setting.  On the other hand, the
 more memory you allocate to the buffers, the better the performance is
 supposedly.  So at least as of 9.1, this is annoying.  I heard that
 starting from 9.2, this behavior changed dramatically?

I'm not sure I understand what you're trying to say. It's true the
documentation does not give perfectly clear instructions on how to set
optimal shared_buffers. The reason is very simple - there's no such value,
optimal for all workloads, operating systems and PostgreSQL versions. The
assumption that the more is better is incorrect for several reasons.

There's a more detailed wiki page about this:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

The 200MB you've set seems way too low in your case, and aside from making
it slower than necessary, you most probably haven't fixed the actual
issue.

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Prefix search on all hstore values

2013-11-27 Thread Albert Chern
Hi,

I have an hstore column that stores a string in several arbitrary
languages, so something like this:

en = string in english, zh = string in chinese, fr = string
in french

Is it possible to construct an index that can be used to determine if a
query string is a prefix of ANY of the values in the hstore?  From reading
the documentation the closest I've gotten is a gin index after converting
the values to an array, but that doesn't seem to work with prefix
searching.  Any pointers would be much appreciated!

Thanks,
Albert


[GENERAL] unnest on multi-dimensional arrays

2013-11-27 Thread Zev Benjamin
It appears that unnest, when called on a multi-dimensional array, 
effectively flattens the array first.  For example:


= select * from unnest(array[array[1, 2], array[2, 3]]);
 unnest

  1
  2
  2
  3
(4 rows)

while I would have expect something like the following:

= select * from unnest(array[array[1, 2], array[2, 3]]);
 unnest

   {1, 2}
   {2, 3}
(2 rows)

Is there any way to get the latter behavior?


Zev


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] unnest on multi-dimensional arrays

2013-11-27 Thread Pavel Stehule
Hello

postgres=# CREATE OR REPLACE FUNCTION public.reduce_dim(anyarray)
 RETURNS SETOF anyarray
 LANGUAGE plpgsql
AS $function$
DECLARE s $1%type;
BEGIN
  FOREACH s SLICE 1  IN ARRAY $1 LOOP
  RETURN NEXT s;
  END LOOP;
RETURN;
END;
$function$;
CREATE FUNCTION

postgres=# select reduce_dim(array[array[1, 2], array[2, 3]]);
 reduce_dim

 {1,2}
 {2,3}
(2 rows)

Regards

Pavel Stehule

2013/11/28 Zev Benjamin zev-pg...@strangersgate.com

 It appears that unnest, when called on a multi-dimensional array,
 effectively flattens the array first.  For example:

 = select * from unnest(array[array[1, 2], array[2, 3]]);
  unnest
 
   1
   2
   2
   3
 (4 rows)

 while I would have expect something like the following:

 = select * from unnest(array[array[1, 2], array[2, 3]]);
  unnest
 
{1, 2}
{2, 3}
 (2 rows)

 Is there any way to get the latter behavior?


 Zev


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] unnest on multi-dimensional arrays

2013-11-27 Thread David Johnston
Zev Benjamin wrote
 It appears that unnest, when called on a multi-dimensional array, 
 effectively flattens the array first.  For example:
 
 = select * from unnest(array[array[1, 2], array[2, 3]]);
   unnest
 
1
2
2
3
 (4 rows)
 
 while I would have expect something like the following:
 
 = select * from unnest(array[array[1, 2], array[2, 3]]);
   unnest
 
 {1, 2}
 {2, 3}
 (2 rows)
 
 Is there any way to get the latter behavior?

Manually.  Cross-Join the array with a generate-series of the same length
and use the series number as an array index.  So row 1 gets cell 1 and so
forth.

Multidimensional arrays do have shortcomings in the current implementation
of which this is one.  I'm not sure, though, if there is anything
substantial and centralized in the docs so pertaining.  I may look later but
the cross-join+sequence idiom is generally useful and especially for
problems like this.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/unnest-on-multi-dimensional-arrays-tp5780706p5780709.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general