Re: [PERFORM] HFS+ pg_test_fsync performance

2014-04-15 Thread desmodemone
2014-04-15 0:32 GMT+02:00 Mel Llaguno mllag...@coverity.com:

 I was given anecdotal information regarding HFS+ performance under OSX as
 being unsuitable for production PG deployments and that pg_test_fsync
 could be used to measure the relative speed versus other operating systems
 (such as Linux). In my performance lab, I have a number of similarly
 equipped Linux hosts (Ubuntu 12.04 64-bit LTS Server w/128Gb RAM / 2 OWC
 6g Mercury Extreme SSDs / 7200rpm SATA3 HDD / 16 E5-series cores) that I
 used to capture baseline Linux numbers. As we generally recommend our
 customers use SSD (the s3700 recommended by PG), I wanted to perform a
 comparison. On these beefy machines I ran the following tests:

 SSD:

 # pg_test_fsync -f ./fsync.out -s 30
 30 seconds per test
 O_DIRECT supported on this platform for open_datasync and open_sync.

 Compare file sync methods using one 8kB write:
 (in wal_sync_method preference order, except fdatasync
 is Linux's default)
 open_datasync n/a
 fdatasync2259.652 ops/sec 443 usecs/op
 fsync1949.664 ops/sec 513 usecs/op
 fsync_writethroughn/a
 open_sync2245.162 ops/sec 445 usecs/op

 Compare file sync methods using two 8kB writes:
 (in wal_sync_method preference order, except fdatasync
 is Linux's default)
 open_datasync n/a
 fdatasync2161.941 ops/sec 463 usecs/op
 fsync1891.894 ops/sec 529 usecs/op
 fsync_writethroughn/a
 open_sync1118.826 ops/sec 894 usecs/op

 Compare open_sync with different write sizes:
 (This is designed to compare the cost of writing 16kB
 in different write open_sync sizes.)
  1 * 16kB open_sync write2171.558 ops/sec 460 usecs/op
  2 *  8kB open_sync writes   1126.490 ops/sec 888 usecs/op
  4 *  4kB open_sync writes569.594 ops/sec1756 usecs/op
  8 *  2kB open_sync writes285.149 ops/sec3507 usecs/op
 16 *  1kB open_sync writes142.528 ops/sec7016 usecs/op

 Test if fsync on non-write file descriptor is honored:
 (If the times are similar, fsync() can sync data written
 on a different descriptor.)
 write, fsync, close  1947.557 ops/sec 513 usecs/op
 write, close, fsync  1951.082 ops/sec 513 usecs/op

 Non-Sync'ed 8kB writes:
 write   481296.909 ops/sec   2 usecs/op


 HDD:

 pg_test_fsync -f /tmp/fsync.out -s 30
 30 seconds per test
 O_DIRECT supported on this platform for open_datasync and open_sync.

 Compare file sync methods using one 8kB write:
 (in wal_sync_method preference order, except fdatasync
 is Linux's default)
 open_datasync n/a
 fdatasync 105.783 ops/sec9453 usecs/op
 fsync  27.692 ops/sec   36111 usecs/op
 fsync_writethroughn/a
 open_sync 103.399 ops/sec9671 usecs/op

 Compare file sync methods using two 8kB writes:
 (in wal_sync_method preference order, except fdatasync
 is Linux's default)
 open_datasync n/a
 fdatasync 104.647 ops/sec9556 usecs/op
 fsync  27.223 ops/sec   36734 usecs/op
 fsync_writethroughn/a
 open_sync  55.839 ops/sec   17909 usecs/op

 Compare open_sync with different write sizes:
 (This is designed to compare the cost of writing 16kB
 in different write open_sync sizes.)
  1 * 16kB open_sync write 103.581 ops/sec9654 usecs/op
  2 *  8kB open_sync writes 55.207 ops/sec   18113 usecs/op
  4 *  4kB open_sync writes 28.320 ops/sec   35311 usecs/op
  8 *  2kB open_sync writes 14.581 ops/sec   68582 usecs/op
 16 *  1kB open_sync writes  7.407 ops/sec  135003 usecs/op

 Test if fsync on non-write file descriptor is honored:
 (If the times are similar, fsync() can sync data written
 on a different descriptor.)
 write, fsync, close27.228 ops/sec   36727 usecs/op
 write, close, fsync27.108 ops/sec   36890 usecs/op

 Non-Sync'ed 8kB writes:
 write   466108.001 ops/sec   2 usecs/op


 ---

 So far, so good. Local HDD vs. SSD shows a significant difference in fsync
 performance. Here are the corresponding fstab entries :

 /dev/mapper/cim-base
 /opt/cimext4defaults,noatime,nodiratime,discard 0
   2 (SSD)
 

Re: [PERFORM] HFS+ pg_test_fsync performance

2014-04-15 Thread Mel Llaguno


My 2 cents :

The results are not surprising, in the linux enviroment the i/o call of 
pg_test_fsync  are using O_DIRECT  (PG_O_DIRECT) with also the O_SYNC or 
O_DSYNC calls, so ,in practice, it is waiting the answer from the storage 
bypassing the cache  in sync mode, while in  the Mac OS X it is not doing so, 
it's only using the O_SYNC or O_DSYNC calls without O_DIRECT,  in practice, 
it's using the cache of filesystem , even if it is asking the sync of io calls.


Bye

Mat Dba



Thanks for the explanation. Given that OSX always seems to use filesystem 
cache, is there a way to measure fsync performance that is equivalent to Linux? 
Or will the use of pg_test_fsync always be inflated under OSX? The reason I ask 
is that we would like to make a case with a customer that PG performance on 
OSX/HFS+ would be sub-optimal compared to using Linux/EXT4 (or FreeBSD/UFS2 for 
that matter).

Thanks, Mel


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco
robert.difa...@gmail.comwrote:

 I have several related tables that represent a call state. Let's think of
 these as phone calls to simplify things. Sometimes I need to determine the
 last time a user was called, the last time a user answered a call, or the
 last time a user completed a call.

 The basic schema is something like this:

 CREATE TABLE calls (
   id  BIGINT NOT NULL, // sequence generator
   user_id BIGINT NOT NULL,
   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
 );

 CREATE TABLE calls_answered (
   idBIGINT NOT NULL,
   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
 );


 And so on for calls_connected, calls_completed, call_errors, etc.

 Occasionally I will want to know things like When was the last time a
 user answered a call or How many times has a user been called.

 I can do these queries using a combination of MAX or COUNT. But I'm
 concerned about the performance.

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Or the number of answered calls:

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Sometimes I might want to get this data for a whole bunch of users. For
 example, give me all users whose have not answered a call in the last 5
 days. Or even what percentage of users called actually answered a call.
 This approach could become a performance issue. So the other option is to
 create a call_summary table that is updated with triggers.

 The summary table would need fields like user_id, last_call_id,
 call_count, last_answered_id, answered_count, last_completed_id,
 last_completed_count, etc.

 My only issue with a summary table is that I don't want a bunch of null
 fields. For example, if the user was *called* but they have never
 *answered* at call then the last_call_id and call_count fields on the
 summary table would be non-NULL but the last_answer_id and answer_count
 fields WOULD be NULL. But over time all fields would eventually become
 non-NULL.

 So that leads me to a summary table for EACH call state. Each summary
 table would have a user id, a ref_id, and a count -- one summary table for
 each state e.g. call_summary, call_answered_summary, etc.

 This approach has the down side that it creates a lot of tables and
 triggers. It has the upside of being pretty efficient without having to
 deal with NULL values.  It's also pretty easy to reason about.

 So for my question -- is the choice between these a personal preference
 sort of thing or is there a right or wrong approach? Am I missing another
 approach that would be better?  I'm okay with SQL but I'm not expert so I'm
 not sure if there is an accepted DESIGN PATTERN for this that I am missing.

 Thanks!



My initial thought is:  that design is over-normalized. The thing you are
trying to model is the call, and it has severl attributes, some of which
may be unknown or not applicable (which is what NULL is for).  So my
thought would be to do something like this:

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
answered  TIMESTAMPTZ

  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);


-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey ch...@chriscurvey.comwrote:

 On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco robert.difa...@gmail.com
  wrote:

 I have several related tables that represent a call state. Let's think of
 these as phone calls to simplify things. Sometimes I need to determine the
 last time a user was called, the last time a user answered a call, or the
 last time a user completed a call.

 The basic schema is something like this:

 CREATE TABLE calls (
   id  BIGINT NOT NULL, // sequence generator
   user_id BIGINT NOT NULL,
   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
 );

 CREATE TABLE calls_answered (
   idBIGINT NOT NULL,
   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
 );


 And so on for calls_connected, calls_completed, call_errors, etc.

 Occasionally I will want to know things like When was the last time a
 user answered a call or How many times has a user been called.

 I can do these queries using a combination of MAX or COUNT. But I'm
 concerned about the performance.

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Or the number of answered calls:

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Sometimes I might want to get this data for a whole bunch of users. For
 example, give me all users whose have not answered a call in the last 5
 days. Or even what percentage of users called actually answered a call.
 This approach could become a performance issue. So the other option is to
 create a call_summary table that is updated with triggers.

 The summary table would need fields like user_id, last_call_id,
 call_count, last_answered_id, answered_count, last_completed_id,
 last_completed_count, etc.

 My only issue with a summary table is that I don't want a bunch of null
 fields. For example, if the user was *called* but they have never
 *answered* at call then the last_call_id and call_count fields on the
 summary table would be non-NULL but the last_answer_id and answer_count
 fields WOULD be NULL. But over time all fields would eventually become
 non-NULL.

 So that leads me to a summary table for EACH call state. Each summary
 table would have a user id, a ref_id, and a count -- one summary table for
 each state e.g. call_summary, call_answered_summary, etc.

 This approach has the down side that it creates a lot of tables and
 triggers. It has the upside of being pretty efficient without having to
 deal with NULL values.  It's also pretty easy to reason about.

 So for my question -- is the choice between these a personal preference
 sort of thing or is there a right or wrong approach? Am I missing another
 approach that would be better?  I'm okay with SQL but I'm not expert so I'm
 not sure if there is an accepted DESIGN PATTERN for this that I am missing.

 Thanks!



 (Sorry, fat-fingered and hit send too early...)

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  answered  TIMESTAMPTZ NULL,
  connected  TIMESTAMPTZ NULL,
  completed  TIMESTAMPTZ NULL,


  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);

Then  your queries end up looking like this:

--last time john answered
SELECT MAX(a.id)
FROM calls
where answered is not null
and user_id = ?

-- users that have not answered a call in the last five days (I can think
of a few ways to interpret that phrase)
select myusers.*
from myusers
where not exists
(  select *
  from calls
  where calls.user_id = myusers.user_id
  and answered = five days ago)

-- average ring time
select avg(extract ('seconds' from called - answered))
where answered is not null



-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Robert DiFalco
Actually that was exactly the initial table design. There were more fields
because for my use case there were a lot more states and certain states
have additional data (for example when a call goes from answered to
connected it also gets the user_id of the person being connected to). So
that one table started getting a LOT of columns which starting making it
hard to reason about.

The more normalized version has a couple of things going for it. COUNT,
MIN, MAX, etc are very fast because I don't have to conditionally add null
checks. Everything is inserted so for the millions of calls that get made
the normalized schema was much more efficient for writing. It was also
easier to understand. The answer table only has calls that were answered,
the error table only has calls the resulted in an error after being
connected, etc.

I know this kind of gets into a religious area when discussing NULLs and
what level of normalization is appropriate so I don't want to spark any of
that on this thread. But only doing inserts and never doing updates or
deletes performed very well for large data sets.

That said, I could explore a compromise between the monolithic table
approach and the completely normalized set of tables approach. Thanks for
your input!


On Tue, Apr 15, 2014 at 8:12 AM, Chris Curvey ch...@chriscurvey.com wrote:




 On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey ch...@chriscurvey.comwrote:

 On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco 
 robert.difa...@gmail.com wrote:

 I have several related tables that represent a call state. Let's think
 of these as phone calls to simplify things. Sometimes I need to determine
 the last time a user was called, the last time a user answered a call, or
 the last time a user completed a call.

 The basic schema is something like this:

 CREATE TABLE calls (
   id  BIGINT NOT NULL, // sequence generator
   user_id BIGINT NOT NULL,
   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
 );

 CREATE TABLE calls_answered (
   idBIGINT NOT NULL,
   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,

   PRIMARY KEY (id),
   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
 );


 And so on for calls_connected, calls_completed, call_errors, etc.

 Occasionally I will want to know things like When was the last time a
 user answered a call or How many times has a user been called.

 I can do these queries using a combination of MAX or COUNT. But I'm
 concerned about the performance.

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Or the number of answered calls:

 SELECT MAX(a.id)
 FROM calls_answered a JOIN calls c ON c.id = a.id
 WHERE c.user_id = ?;


 Sometimes I might want to get this data for a whole bunch of users. For
 example, give me all users whose have not answered a call in the last 5
 days. Or even what percentage of users called actually answered a call.
 This approach could become a performance issue. So the other option is to
 create a call_summary table that is updated with triggers.

 The summary table would need fields like user_id, last_call_id,
 call_count, last_answered_id, answered_count, last_completed_id,
 last_completed_count, etc.

 My only issue with a summary table is that I don't want a bunch of null
 fields. For example, if the user was *called* but they have never
 *answered* at call then the last_call_id and call_count fields on the
 summary table would be non-NULL but the last_answer_id and answer_count
 fields WOULD be NULL. But over time all fields would eventually become
 non-NULL.

 So that leads me to a summary table for EACH call state. Each summary
 table would have a user id, a ref_id, and a count -- one summary table for
 each state e.g. call_summary, call_answered_summary, etc.

 This approach has the down side that it creates a lot of tables and
 triggers. It has the upside of being pretty efficient without having to
 deal with NULL values.  It's also pretty easy to reason about.

 So for my question -- is the choice between these a personal preference
 sort of thing or is there a right or wrong approach? Am I missing another
 approach that would be better?  I'm okay with SQL but I'm not expert so I'm
 not sure if there is an accepted DESIGN PATTERN for this that I am missing.

 Thanks!



 (Sorry, fat-fingered and hit send too early...)

 CREATE TABLE calls (
   id  BIGINT NOT NULL, // sequence generator
   user_id BIGINT NOT NULL,
   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
   answered  TIMESTAMPTZ NULL,
   connected  TIMESTAMPTZ NULL,
   completed  TIMESTAMPTZ NULL,


   PRIMARY KEY (id),
   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
 );

 Then  your queries end up looking like this:

 --last time john answered
 SELECT MAX(a.id)
 FROM calls
 where answered is not null
 and user_id = ?

 -- users that have not answered a 

[PERFORM] Testing strategies

2014-04-15 Thread Dave Cramer
I have a client wanting to test PostgreSQL on ZFS running Linux.

Other than pg_bench are there any other benchmarks that are easy to test?

One of the possible concerns is fragmentation over time. Any ideas on how
to fragment the database before running pg_bench ?

Also there is some concern about fragmentation of the WAL logs. I am
looking at testing with and without the WAL logs on ZFS. Any other specific
concerns ?


Dave Cramer
credativ ltd (Canada)

78 Zina St
Orangeville, ON
Canada. L9W 1E8

Office: +1 (905) 766-4091
Mobile: +1 (519) 939-0336

===
Canada:  http://www.credativ.ca
USA: http://www.credativ.us
Germany: http://www.credativ.de
Netherlands: http://www.credativ.nl
UK:  http://www.credativ.co.uk
India:   http://www.credativ.in
===


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Nick Eubank
On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank nickeub...@gmail.com wrote:

 Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer,
 etc. for a database that DOESN'T anticipate concurrent connections and that
 is doing lots of aggregate functions on large tables? All the advice I
 can find online on tuning 
 (thishttp://wiki.postgresql.org/wiki/Performance_Optimization
 , 
 thishttp://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf
 , this http://www.revsys.com/writings/postgresql-performance.html etc.)
 is written for people anticipating lots of concurrent connections.

 I'm a social scientist looking to use Postgres not as a database to be
 shared by multiple users, but rather as my own tool for manipulating a
 massive data set (I have 5 billion transaction records (600gb in csv) and
 want to pull out unique user pairs, estimate aggregates for individual
 users, etc.). This also means almost no writing, except to creation of new
 tables based on selections from the main table.

 I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if
 that's important.


 I'd go with a small shared_buffers, like 128MB, and let the OS cache as
 much as possible.  This minimizes the amount of double buffering.

 And set work_mem to about 6GB, then bump it up if that doesn't seem to
 cause problems.

 In the scenario you describe, it is probably no big deal if you guess too
 high.  Monitor the process, if it it starts to go nuts just kill it and
 start again with a lower work_mem.  If it is a single user system, you can
 afford to be adventurous.

 If you need to build indexes, you should bump up maintenance_work_mem, but
 I just would do that in the local session not system wide.

 Cheers,

 Jeff



Quick followup Jeff: it seems that I can't set work_mem above about 1gb
(can't get to 2gb. When I update config, the values just don't change in
SHOW ALL -- integer constraint?). Is there a work around, or should I
tweak something else accordingly?

Thanks!

Nick

(Properly bottom posted this time?)


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Jeff Janes
On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank nickeub...@gmail.com wrote:

 On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes jeff.ja...@gmail.com wrote:


 I'd go with a small shared_buffers, like 128MB, and let the OS cache as
 much as possible.  This minimizes the amount of double buffering.

 And set work_mem to about 6GB, then bump it up if that doesn't seem to
 cause problems.

 In the scenario you describe, it is probably no big deal if you guess too
 high.  Monitor the process, if it it starts to go nuts just kill it and
 start again with a lower work_mem.  If it is a single user system, you can
 afford to be adventurous.

 If you need to build indexes, you should bump up maintenance_work_mem,
 but I just would do that in the local session not system wide.

 Cheers,

 Jeff



 Quick followup Jeff: it seems that I can't set work_mem above about 1gb
 (can't get to 2gb. When I update config, the values just don't change in
 SHOW ALL -- integer constraint?). Is there a work around, or should I
 tweak something else accordingly?



What version are you using?  What is the exact line you put in your config
file?  Did you get any errors when using that config file?  Are you sure
you actually reloaded the server, so that it reread the config file, rather
than just changing the file and then not applying the change?

I usually set work_mem within a psql connection, in which case you need to
quote the setting if you use units:

set work_mem=3GB;

But if you set it system wide in the config file the quotes should not be
needed.



 Thanks!

 Nick

 (Properly bottom posted this time?)


Looked good to me.

Cheers,

Jeff


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Schnabel, Robert D.


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Nick Eubank
Sent: Tuesday, April 15, 2014 11:12 AM
To: Jeff Janes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Tuning Postgres for Single connection use



On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank nickeub...@gmail.com wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for 
a database that DOESN'T anticipate concurrent connections and that is doing 
lots of aggregate functions on large tables? All the advice I can find online 
on tuning (this, this, this etc.) is written for people anticipating lots of 
concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared 
by multiple users, but rather as my own tool for manipulating a massive data 
set (I have 5 billion transaction records (600gb in csv) and want to pull out 
unique user pairs, estimate aggregates for individual users, etc.). This also 
means almost no writing, except to creation of new tables based on selections 
from the main table. 
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's 
important.

I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as 
possible.  This minimizes the amount of double buffering.

And set work_mem to about 6GB, then bump it up if that doesn't seem to cause 
problems.

In the scenario you describe, it is probably no big deal if you guess too high. 
 Monitor the process, if it it starts to go nuts just kill it and start again 
with a lower work_mem.  If it is a single user system, you can afford to be 
adventurous.

If you need to build indexes, you should bump up maintenance_work_mem, but I 
just would do that in the local session not system wide.

Cheers,

Jeff
 


Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't 
get to 2gb. When I update config, the values just don't change in SHOW ALL -- 
integer constraint?). Is there a work around, or should I tweak something else 
accordingly? 

Thanks!

Nick

(Properly bottom posted this time?) 

[Schnabel, Robert D.] 

Nick,

I asked the same question a while ago about work_mem on Windows.  See this 
thread:
http://www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us

Bob


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank nickeub...@gmail.com wrote:
 Quick followup Jeff: it seems that I can't set work_mem above about 1gb
 (can't get to 2gb. When I update config, the values just don't change in
 SHOW ALL -- integer constraint?). Is there a work around, or should I
 tweak something else accordingly?

 What version are you using?  What is the exact line you put in your config
 file?  Did you get any errors when using that config file?  Are you sure
 you actually reloaded the server, so that it reread the config file, rather
 than just changing the file and then not applying the change?

 I usually set work_mem within a psql connection, in which case you need to
 quote the setting if you use units:
 set work_mem=3GB;

FWIW, it's generally considered a seriously *bad* idea to set work_mem as
high as 1GB in postgresql.conf: you're promising that each query running
on the server can use 1GB per sort or hash step.  You probably don't have
the machine resources to honor that promise.  (If you do, I'd like to have
your IT budget ;-))  Recommended practice is to keep the global setting
conservatively small, and bump it up locally in your session (with SET)
for individual queries that need the very large value.

But having said that, Postgres doesn't try to enforce any such practice.
My money is on what Jeff is evidently thinking: you forgot to do pg_ctl
reload, or else the setting is too large for your platform, in which case
there should have been a complaint in the postmaster log.  As noted
elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
Windows.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Nick Eubank


 On Tuesday, April 15, 2014, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Janes jeff.ja...@gmail.com writes:
  On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank nickeub...@gmail.com
 wrote:
  Quick followup Jeff: it seems that I can't set work_mem above about 1gb
  (can't get to 2gb. When I update config, the values just don't change in
  SHOW ALL -- integer constraint?). Is there a work around, or should I
  tweak something else accordingly?

  What version are you using?  What is the exact line you put in your
 config
  file?  Did you get any errors when using that config file?  Are you sure
  you actually reloaded the server, so that it reread the config file,
 rather
  than just changing the file and then not applying the change?

  I usually set work_mem within a psql connection, in which case you need
 to
  quote the setting if you use units:
  set work_mem=3GB;

 FWIW, it's generally considered a seriously *bad* idea to set work_mem as
 high as 1GB in postgresql.conf: you're promising that each query running
 on the server can use 1GB per sort or hash step.  You probably don't have
 the machine resources to honor that promise.  (If you do, I'd like to have
 your IT budget ;-))  Recommended practice is to keep the global setting
 conservatively small, and bump it up locally in your session (with SET)
 for individual queries that need the very large value.

 But having said that, Postgres doesn't try to enforce any such practice.
 My money is on what Jeff is evidently thinking: you forgot to do pg_ctl
 reload, or else the setting is too large for your platform, in which case
 there should have been a complaint in the postmaster log.  As noted
 elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
 Windows.

 regards, tom lane


Thanks Tom --  quick follow up: I know that 1gb work_mem is a terrible idea
for normal postgres users with lots of concurrent users, but for my
situations where there will only ever be one connection running one query,
why is that a problem on a machine with 16gb of ram.

Re:Robert -- thanks for that clarification!


Re: [PERFORM] Testing strategies

2014-04-15 Thread Matheus de Oliveira
On Tue, Apr 15, 2014 at 12:57 PM, Dave Cramer davecra...@gmail.com wrote:

 I have a client wanting to test PostgreSQL on ZFS running Linux. Other
 than pg_bench are there any other benchmarks that are easy to test?


Check Gregory Smith article about testing disks [1].

[1] http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm

-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


[PERFORM] Workaround for working_mem max value in windows?

2014-04-15 Thread Nick Eubank
Hi all,

A few years ago someone said postgres windows can't set working_mem above
about 2 GB (www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us --
seems to be same for maintenance_working_mem ). Im finding limit still
present.

 I'm doing single user, single connection data intensive queries and would
like to set a higher value on windows to better use 16gb built in
ram (don't control platform, so can't jump to Linux).

Anyone found a work around?

Thanks!

Nick