Re: [PERFORM] HFS+ pg_test_fsync performance

2014-04-15 Thread desmodemone
2014-04-15 0:32 GMT+02:00 Mel Llaguno :

> I was given anecdotal information regarding HFS+ performance under OSX as
> being unsuitable for production PG deployments and that pg_test_fsync
> could be used to measure the relative speed versus other operating systems
> (such as Linux). In my performance lab, I have a number of similarly
> equipped Linux hosts (Ubuntu 12.04 64-bit LTS Server w/128Gb RAM / 2 OWC
> 6g Mercury Extreme SSDs / 7200rpm SATA3 HDD / 16 E5-series cores) that I
> used to capture baseline Linux numbers. As we generally recommend our
> customers use SSD (the s3700 recommended by PG), I wanted to perform a
> comparison. On these beefy machines I ran the following tests:
>
> SSD:
>
> # pg_test_fsync -f ./fsync.out -s 30
> 30 seconds per test
> O_DIRECT supported on this platform for open_datasync and open_sync.
>
> Compare file sync methods using one 8kB write:
> (in wal_sync_method preference order, except fdatasync
> is Linux's default)
> open_datasync n/a
> fdatasync2259.652 ops/sec 443 usecs/op
> fsync1949.664 ops/sec 513 usecs/op
> fsync_writethroughn/a
> open_sync2245.162 ops/sec 445 usecs/op
>
> Compare file sync methods using two 8kB writes:
> (in wal_sync_method preference order, except fdatasync
> is Linux's default)
> open_datasync n/a
> fdatasync2161.941 ops/sec 463 usecs/op
> fsync1891.894 ops/sec 529 usecs/op
> fsync_writethroughn/a
> open_sync1118.826 ops/sec 894 usecs/op
>
> Compare open_sync with different write sizes:
> (This is designed to compare the cost of writing 16kB
> in different write open_sync sizes.)
>  1 * 16kB open_sync write2171.558 ops/sec 460 usecs/op
>  2 *  8kB open_sync writes   1126.490 ops/sec 888 usecs/op
>  4 *  4kB open_sync writes569.594 ops/sec1756 usecs/op
>  8 *  2kB open_sync writes285.149 ops/sec3507 usecs/op
> 16 *  1kB open_sync writes142.528 ops/sec7016 usecs/op
>
> Test if fsync on non-write file descriptor is honored:
> (If the times are similar, fsync() can sync data written
> on a different descriptor.)
> write, fsync, close  1947.557 ops/sec 513 usecs/op
> write, close, fsync  1951.082 ops/sec 513 usecs/op
>
> Non-Sync'ed 8kB writes:
> write   481296.909 ops/sec   2 usecs/op
>
>
> HDD:
>
> pg_test_fsync -f /tmp/fsync.out -s 30
> 30 seconds per test
> O_DIRECT supported on this platform for open_datasync and open_sync.
>
> Compare file sync methods using one 8kB write:
> (in wal_sync_method preference order, except fdatasync
> is Linux's default)
> open_datasync n/a
> fdatasync 105.783 ops/sec9453 usecs/op
> fsync  27.692 ops/sec   36111 usecs/op
> fsync_writethroughn/a
> open_sync 103.399 ops/sec9671 usecs/op
>
> Compare file sync methods using two 8kB writes:
> (in wal_sync_method preference order, except fdatasync
> is Linux's default)
> open_datasync n/a
> fdatasync 104.647 ops/sec9556 usecs/op
> fsync  27.223 ops/sec   36734 usecs/op
> fsync_writethroughn/a
> open_sync  55.839 ops/sec   17909 usecs/op
>
> Compare open_sync with different write sizes:
> (This is designed to compare the cost of writing 16kB
> in different write open_sync sizes.)
>  1 * 16kB open_sync write 103.581 ops/sec9654 usecs/op
>  2 *  8kB open_sync writes 55.207 ops/sec   18113 usecs/op
>  4 *  4kB open_sync writes 28.320 ops/sec   35311 usecs/op
>  8 *  2kB open_sync writes 14.581 ops/sec   68582 usecs/op
> 16 *  1kB open_sync writes  7.407 ops/sec  135003 usecs/op
>
> Test if fsync on non-write file descriptor is honored:
> (If the times are similar, fsync() can sync data written
> on a different descriptor.)
> write, fsync, close27.228 ops/sec   36727 usecs/op
> write, close, fsync27.108 ops/sec   36890 usecs/op
>
> Non-Sync'ed 8kB writes:
> write   466108.001 ops/sec   2 usecs/op
>
>
> ---
>
> So far, so good. Local HDD vs. SSD shows a significant difference in fsync
> performance. Here are the corresponding fstab entries :
>
> /dev/mapper/cim-base
> /opt/cimext4d

Re: [PERFORM] HFS+ pg_test_fsync performance

2014-04-15 Thread Mel Llaguno


My 2 cents :

The results are not surprising, in the linux enviroment the i/o call of 
pg_test_fsync  are using O_DIRECT  (PG_O_DIRECT) with also the O_SYNC or 
O_DSYNC calls, so ,in practice, it is waiting the "answer" from the storage 
bypassing the cache  in sync mode, while in  the Mac OS X it is not doing so, 
it's only using the O_SYNC or O_DSYNC calls without O_DIRECT,  in practice, 
it's using the cache of filesystem , even if it is asking the sync of io calls.


Bye

Mat Dba



Thanks for the explanation. Given that OSX always seems to use filesystem 
cache, is there a way to measure fsync performance that is equivalent to Linux? 
Or will the use of pg_test_fsync always be inflated under OSX? The reason I ask 
is that we would like to make a case with a customer that PG performance on 
OSX/HFS+ would be sub-optimal compared to using Linux/EXT4 (or FreeBSD/UFS2 for 
that matter).

Thanks, Mel


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco
wrote:

> I have several related tables that represent a call state. Let's think of
> these as phone calls to simplify things. Sometimes I need to determine the
> last time a user was called, the last time a user answered a call, or the
> last time a user completed a call.
>
> The basic schema is something like this:
>
> CREATE TABLE calls (
>   id  BIGINT NOT NULL, // sequence generator
>   user_id BIGINT NOT NULL,
>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>
>   PRIMARY KEY (id),
>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
> );
>
> CREATE TABLE calls_answered (
>   idBIGINT NOT NULL,
>   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>
>   PRIMARY KEY (id),
>   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
> );
>
>
> And so on for calls_connected, calls_completed, call_errors, etc.
>
> Occasionally I will want to know things like "When was the last time a
> user answered a call" or "How many times has a user been called".
>
> I can do these queries using a combination of MAX or COUNT. But I'm
> concerned about the performance.
>
> SELECT MAX(a.id)
> FROM calls_answered a JOIN calls c ON c.id = a.id
> WHERE c.user_id = ?;
>
>
> Or the number of answered calls:
>
> SELECT MAX(a.id)
> FROM calls_answered a JOIN calls c ON c.id = a.id
> WHERE c.user_id = ?;
>
>
> Sometimes I might want to get this data for a whole bunch of users. For
> example, "give me all users whose have not answered a call in the last 5
> days." Or even "what percentage of users called actually answered a call."
> This approach could become a performance issue. So the other option is to
> create a call_summary table that is updated with triggers.
>
> The summary table would need fields like "user_id", "last_call_id",
> "call_count", "last_answered_id", "answered_count", "last_completed_id",
> "last_completed_count", etc.
>
> My only issue with a summary table is that I don't want a bunch of null
> fields. For example, if the user was *called* but they have never
> *answered* at call then the last_call_id and call_count fields on the
> summary table would be non-NULL but the last_answer_id and answer_count
> fields WOULD be NULL. But over time all fields would eventually become
> non-NULL.
>
> So that leads me to a summary table for EACH call state. Each summary
> table would have a user id, a ref_id, and a count -- one summary table for
> each state e.g. call_summary, call_answered_summary, etc.
>
> This approach has the down side that it creates a lot of tables and
> triggers. It has the upside of being pretty efficient without having to
> deal with NULL values.  It's also pretty easy to reason about.
>
> So for my question -- is the choice between these a personal preference
> sort of thing or is there a right or wrong approach? Am I missing another
> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>
> Thanks!
>
>
>
My initial thought is:  that design is over-normalized. The thing you are
trying to model is the call, and it has severl attributes, some of which
may be unknown or not applicable (which is what NULL is for).  So my
thought would be to do something like this:

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
answered  TIMESTAMPTZ

  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);


-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Chris Curvey
On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey wrote:

> On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco  > wrote:
>
>> I have several related tables that represent a call state. Let's think of
>> these as phone calls to simplify things. Sometimes I need to determine the
>> last time a user was called, the last time a user answered a call, or the
>> last time a user completed a call.
>>
>> The basic schema is something like this:
>>
>> CREATE TABLE calls (
>>   id  BIGINT NOT NULL, // sequence generator
>>   user_id BIGINT NOT NULL,
>>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>
>>   PRIMARY KEY (id),
>>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
>> );
>>
>> CREATE TABLE calls_answered (
>>   idBIGINT NOT NULL,
>>   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>
>>   PRIMARY KEY (id),
>>   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
>> );
>>
>>
>> And so on for calls_connected, calls_completed, call_errors, etc.
>>
>> Occasionally I will want to know things like "When was the last time a
>> user answered a call" or "How many times has a user been called".
>>
>> I can do these queries using a combination of MAX or COUNT. But I'm
>> concerned about the performance.
>>
>> SELECT MAX(a.id)
>> FROM calls_answered a JOIN calls c ON c.id = a.id
>> WHERE c.user_id = ?;
>>
>>
>> Or the number of answered calls:
>>
>> SELECT MAX(a.id)
>> FROM calls_answered a JOIN calls c ON c.id = a.id
>> WHERE c.user_id = ?;
>>
>>
>> Sometimes I might want to get this data for a whole bunch of users. For
>> example, "give me all users whose have not answered a call in the last 5
>> days." Or even "what percentage of users called actually answered a call."
>> This approach could become a performance issue. So the other option is to
>> create a call_summary table that is updated with triggers.
>>
>> The summary table would need fields like "user_id", "last_call_id",
>> "call_count", "last_answered_id", "answered_count", "last_completed_id",
>> "last_completed_count", etc.
>>
>> My only issue with a summary table is that I don't want a bunch of null
>> fields. For example, if the user was *called* but they have never
>> *answered* at call then the last_call_id and call_count fields on the
>> summary table would be non-NULL but the last_answer_id and answer_count
>> fields WOULD be NULL. But over time all fields would eventually become
>> non-NULL.
>>
>> So that leads me to a summary table for EACH call state. Each summary
>> table would have a user id, a ref_id, and a count -- one summary table for
>> each state e.g. call_summary, call_answered_summary, etc.
>>
>> This approach has the down side that it creates a lot of tables and
>> triggers. It has the upside of being pretty efficient without having to
>> deal with NULL values.  It's also pretty easy to reason about.
>>
>> So for my question -- is the choice between these a personal preference
>> sort of thing or is there a right or wrong approach? Am I missing another
>> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
>> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>>
>> Thanks!
>>
>>
>>
> (Sorry, fat-fingered and hit "send too early"...)

CREATE TABLE calls (
  id  BIGINT NOT NULL, // sequence generator
  user_id BIGINT NOT NULL,
  called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  answered  TIMESTAMPTZ NULL,
  connected  TIMESTAMPTZ NULL,
  completed  TIMESTAMPTZ NULL,


  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
);

Then  your queries end up looking like this:

--last time john answered
SELECT MAX(a.id)
FROM calls
where answered is not null
and user_id = ?

-- users that have not answered a call in the last five days (I can think
of a few ways to interpret that phrase)
select myusers.*
from myusers
where not exists
(  select *
  from calls
  where calls.user_id = myusers.user_id
  and answered >= )

-- average ring time
select avg(extract ('seconds' from called - answered))
where answered is not null



-- 
I asked the Internet how to train my cat, and the Internet told me to get a
dog.


Re: [PERFORM] [GENERAL] Approach to Data Summary and Analysis

2014-04-15 Thread Robert DiFalco
Actually that was exactly the initial table design. There were more fields
because for my use case there were a lot more states and certain states
have additional data (for example when a call goes from answered to
connected it also gets the user_id of the person being connected to). So
that one table started getting a LOT of columns which starting making it
hard to reason about.

The more normalized version has a couple of things going for it. COUNT,
MIN, MAX, etc are very fast because I don't have to conditionally add null
checks. Everything is inserted so for the millions of calls that get made
the normalized schema was much more efficient for writing. It was also
easier to understand. The answer table only has calls that were answered,
the error table only has calls the resulted in an error after being
connected, etc.

I know this kind of gets into a religious area when discussing NULLs and
what level of normalization is appropriate so I don't want to spark any of
that on this thread. But only doing inserts and never doing updates or
deletes performed very well for large data sets.

That said, I could explore a compromise between the monolithic table
approach and the completely normalized set of tables approach. Thanks for
your input!


On Tue, Apr 15, 2014 at 8:12 AM, Chris Curvey  wrote:

>
>
>
> On Tue, Apr 15, 2014 at 10:56 AM, Chris Curvey wrote:
>
>> On Mon, Apr 14, 2014 at 12:27 PM, Robert DiFalco <
>> robert.difa...@gmail.com> wrote:
>>
>>> I have several related tables that represent a call state. Let's think
>>> of these as phone calls to simplify things. Sometimes I need to determine
>>> the last time a user was called, the last time a user answered a call, or
>>> the last time a user completed a call.
>>>
>>> The basic schema is something like this:
>>>
>>> CREATE TABLE calls (
>>>   id  BIGINT NOT NULL, // sequence generator
>>>   user_id BIGINT NOT NULL,
>>>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>>
>>>   PRIMARY KEY (id),
>>>   FOREIGN KEY (user_id) REFERENCES my_users(id) ON DELETE CASCADE
>>> );
>>>
>>> CREATE TABLE calls_answered (
>>>   idBIGINT NOT NULL,
>>>   answered  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>>>
>>>   PRIMARY KEY (id),
>>>   FOREIGN KEY (id) REFERENCES calls(id) ON DELETE CASCADE
>>> );
>>>
>>>
>>> And so on for calls_connected, calls_completed, call_errors, etc.
>>>
>>> Occasionally I will want to know things like "When was the last time a
>>> user answered a call" or "How many times has a user been called".
>>>
>>> I can do these queries using a combination of MAX or COUNT. But I'm
>>> concerned about the performance.
>>>
>>> SELECT MAX(a.id)
>>> FROM calls_answered a JOIN calls c ON c.id = a.id
>>> WHERE c.user_id = ?;
>>>
>>>
>>> Or the number of answered calls:
>>>
>>> SELECT MAX(a.id)
>>> FROM calls_answered a JOIN calls c ON c.id = a.id
>>> WHERE c.user_id = ?;
>>>
>>>
>>> Sometimes I might want to get this data for a whole bunch of users. For
>>> example, "give me all users whose have not answered a call in the last 5
>>> days." Or even "what percentage of users called actually answered a call."
>>> This approach could become a performance issue. So the other option is to
>>> create a call_summary table that is updated with triggers.
>>>
>>> The summary table would need fields like "user_id", "last_call_id",
>>> "call_count", "last_answered_id", "answered_count", "last_completed_id",
>>> "last_completed_count", etc.
>>>
>>> My only issue with a summary table is that I don't want a bunch of null
>>> fields. For example, if the user was *called* but they have never
>>> *answered* at call then the last_call_id and call_count fields on the
>>> summary table would be non-NULL but the last_answer_id and answer_count
>>> fields WOULD be NULL. But over time all fields would eventually become
>>> non-NULL.
>>>
>>> So that leads me to a summary table for EACH call state. Each summary
>>> table would have a user id, a ref_id, and a count -- one summary table for
>>> each state e.g. call_summary, call_answered_summary, etc.
>>>
>>> This approach has the down side that it creates a lot of tables and
>>> triggers. It has the upside of being pretty efficient without having to
>>> deal with NULL values.  It's also pretty easy to reason about.
>>>
>>> So for my question -- is the choice between these a personal preference
>>> sort of thing or is there a right or wrong approach? Am I missing another
>>> approach that would be better?  I'm okay with SQL but I'm not expert so I'm
>>> not sure if there is an accepted DESIGN PATTERN for this that I am missing.
>>>
>>> Thanks!
>>>
>>>
>>>
>> (Sorry, fat-fingered and hit "send too early"...)
>
> CREATE TABLE calls (
>   id  BIGINT NOT NULL, // sequence generator
>   user_id BIGINT NOT NULL,
>   called  TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
>   answered  TIMESTAMPTZ NULL,
>   connected  TIMESTAMPTZ NULL,
>   completed  TIMESTAMPTZ NULL,
>
>
>   PRIMARY KEY (id),
>   FOREIGN KEY 

[PERFORM] Testing strategies

2014-04-15 Thread Dave Cramer
I have a client wanting to test PostgreSQL on ZFS running Linux.

Other than pg_bench are there any other benchmarks that are easy to test?

One of the possible concerns is fragmentation over time. Any ideas on how
to fragment the database before running pg_bench ?

Also there is some concern about fragmentation of the WAL logs. I am
looking at testing with and without the WAL logs on ZFS. Any other specific
concerns ?


Dave Cramer
credativ ltd (Canada)

78 Zina St
Orangeville, ON
Canada. L9W 1E8

Office: +1 (905) 766-4091
Mobile: +1 (519) 939-0336

===
Canada:  http://www.credativ.ca
USA: http://www.credativ.us
Germany: http://www.credativ.de
Netherlands: http://www.credativ.nl
UK:  http://www.credativ.co.uk
India:   http://www.credativ.in
===


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Nick Eubank
On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes  wrote:

> On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank  wrote:
>
>> Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer,
>> etc. for a database that DOESN'T anticipate concurrent connections and that
>> is doing lots of aggregate functions on large tables? All the advice I
>> can find online on tuning 
>> (this
>> , 
>> this
>> , this  etc.)
>> is written for people anticipating lots of concurrent connections.
>>
>> I'm a social scientist looking to use Postgres not as a database to be
>> shared by multiple users, but rather as my own tool for manipulating a
>> massive data set (I have 5 billion transaction records (600gb in csv) and
>> want to pull out unique user pairs, estimate aggregates for individual
>> users, etc.). This also means almost no writing, except to creation of new
>> tables based on selections from the main table.
>>
>> I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if
>> that's important.
>>
>
> I'd go with a small shared_buffers, like 128MB, and let the OS cache as
> much as possible.  This minimizes the amount of double buffering.
>
> And set work_mem to about 6GB, then bump it up if that doesn't seem to
> cause problems.
>
> In the scenario you describe, it is probably no big deal if you guess too
> high.  Monitor the process, if it it starts to go nuts just kill it and
> start again with a lower work_mem.  If it is a single user system, you can
> afford to be adventurous.
>
> If you need to build indexes, you should bump up maintenance_work_mem, but
> I just would do that in the local session not system wide.
>
> Cheers,
>
> Jeff
>
>
>
Quick followup Jeff: it seems that I can't set work_mem above about 1gb
(can't get to 2gb. When I update config, the values just don't change in
"SHOW ALL" -- integer constraint?). Is there a work around, or should I
tweak something else accordingly?

Thanks!

Nick

(Properly bottom posted this time?)


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Jeff Janes
On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank  wrote:

> On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes  wrote:
>
>>
>> I'd go with a small shared_buffers, like 128MB, and let the OS cache as
>> much as possible.  This minimizes the amount of double buffering.
>>
>> And set work_mem to about 6GB, then bump it up if that doesn't seem to
>> cause problems.
>>
>> In the scenario you describe, it is probably no big deal if you guess too
>> high.  Monitor the process, if it it starts to go nuts just kill it and
>> start again with a lower work_mem.  If it is a single user system, you can
>> afford to be adventurous.
>>
>> If you need to build indexes, you should bump up maintenance_work_mem,
>> but I just would do that in the local session not system wide.
>>
>> Cheers,
>>
>> Jeff
>>
>>
>>
> Quick followup Jeff: it seems that I can't set work_mem above about 1gb
> (can't get to 2gb. When I update config, the values just don't change in
> "SHOW ALL" -- integer constraint?). Is there a work around, or should I
> tweak something else accordingly?
>


What version are you using?  What is the exact line you put in your config
file?  Did you get any errors when using that config file?  Are you sure
you actually reloaded the server, so that it reread the config file, rather
than just changing the file and then not applying the change?

I usually set work_mem within a psql connection, in which case you need to
quote the setting if you use units:

set work_mem="3GB";

But if you set it system wide in the config file the quotes should not be
needed.



> Thanks!
>
> Nick
>
> (Properly bottom posted this time?)
>

Looked good to me.

Cheers,

Jeff


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Schnabel, Robert D.


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Nick Eubank
Sent: Tuesday, April 15, 2014 11:12 AM
To: Jeff Janes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Tuning Postgres for Single connection use



On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes  wrote:
On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank  wrote:
Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for 
a database that DOESN'T anticipate concurrent connections and that is doing 
lots of aggregate functions on large tables? All the advice I can find online 
on tuning (this, this, this etc.) is written for people anticipating lots of 
concurrent connections.
I'm a social scientist looking to use Postgres not as a database to be shared 
by multiple users, but rather as my own tool for manipulating a massive data 
set (I have 5 billion transaction records (600gb in csv) and want to pull out 
unique user pairs, estimate aggregates for individual users, etc.). This also 
means almost no writing, except to creation of new tables based on selections 
from the main table. 
I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's 
important.

I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as 
possible.  This minimizes the amount of double buffering.

And set work_mem to about 6GB, then bump it up if that doesn't seem to cause 
problems.

In the scenario you describe, it is probably no big deal if you guess too high. 
 Monitor the process, if it it starts to go nuts just kill it and start again 
with a lower work_mem.  If it is a single user system, you can afford to be 
adventurous.

If you need to build indexes, you should bump up maintenance_work_mem, but I 
just would do that in the local session not system wide.

Cheers,

Jeff
 


Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't 
get to 2gb. When I update config, the values just don't change in "SHOW ALL" -- 
integer constraint?). Is there a work around, or should I tweak something else 
accordingly? 

Thanks!

Nick

(Properly bottom posted this time?) 

[Schnabel, Robert D.] 

Nick,

I asked the same question a while ago about work_mem on Windows.  See this 
thread:
http://www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us

Bob


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Tom Lane
Jeff Janes  writes:
> On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank  wrote:
>> Quick followup Jeff: it seems that I can't set work_mem above about 1gb
>> (can't get to 2gb. When I update config, the values just don't change in
>> "SHOW ALL" -- integer constraint?). Is there a work around, or should I
>> tweak something else accordingly?

> What version are you using?  What is the exact line you put in your config
> file?  Did you get any errors when using that config file?  Are you sure
> you actually reloaded the server, so that it reread the config file, rather
> than just changing the file and then not applying the change?

> I usually set work_mem within a psql connection, in which case you need to
> quote the setting if you use units:
> set work_mem="3GB";

FWIW, it's generally considered a seriously *bad* idea to set work_mem as
high as 1GB in postgresql.conf: you're promising that each query running
on the server can use 1GB per sort or hash step.  You probably don't have
the machine resources to honor that promise.  (If you do, I'd like to have
your IT budget ;-))  Recommended practice is to keep the global setting
conservatively small, and bump it up locally in your session (with SET)
for individual queries that need the very large value.

But having said that, Postgres doesn't try to enforce any such practice.
My money is on what Jeff is evidently thinking: you forgot to do "pg_ctl
reload", or else the setting is too large for your platform, in which case
there should have been a complaint in the postmaster log.  As noted
elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
Windows.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[PERFORM] Tuning Postgres for Single connection use

2014-04-15 Thread Nick Eubank
>
>
> On Tuesday, April 15, 2014, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank 
> wrote:
> >> Quick followup Jeff: it seems that I can't set work_mem above about 1gb
> >> (can't get to 2gb. When I update config, the values just don't change in
> >> "SHOW ALL" -- integer constraint?). Is there a work around, or should I
> >> tweak something else accordingly?
>
> > What version are you using?  What is the exact line you put in your
> config
> > file?  Did you get any errors when using that config file?  Are you sure
> > you actually reloaded the server, so that it reread the config file,
> rather
> > than just changing the file and then not applying the change?
>
> > I usually set work_mem within a psql connection, in which case you need
> to
> > quote the setting if you use units:
> > set work_mem="3GB";
>
> FWIW, it's generally considered a seriously *bad* idea to set work_mem as
> high as 1GB in postgresql.conf: you're promising that each query running
> on the server can use 1GB per sort or hash step.  You probably don't have
> the machine resources to honor that promise.  (If you do, I'd like to have
> your IT budget ;-))  Recommended practice is to keep the global setting
> conservatively small, and bump it up locally in your session (with SET)
> for individual queries that need the very large value.
>
> But having said that, Postgres doesn't try to enforce any such practice.
> My money is on what Jeff is evidently thinking: you forgot to do "pg_ctl
> reload", or else the setting is too large for your platform, in which case
> there should have been a complaint in the postmaster log.  As noted
> elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit
> Windows.
>
> regards, tom lane


Thanks Tom --  quick follow up: I know that 1gb work_mem is a terrible idea
for normal postgres users with lots of concurrent users, but for my
situations where there will only ever be one connection running one query,
why is that a problem on a machine with 16gb of ram.

Re:Robert -- thanks for that clarification!


Re: [PERFORM] Testing strategies

2014-04-15 Thread Matheus de Oliveira
On Tue, Apr 15, 2014 at 12:57 PM, Dave Cramer  wrote:

> I have a client wanting to test PostgreSQL on ZFS running Linux. Other
> than pg_bench are there any other benchmarks that are easy to test?


Check Gregory Smith article about testing disks [1].

[1] http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm

-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


[PERFORM] Workaround for working_mem max value in windows?

2014-04-15 Thread Nick Eubank
Hi all,

A few years ago someone said postgres windows can't set working_mem above
about 2 GB (www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us --
seems to be same for maintenance_working_mem ). Im finding limit still
present.

 I'm doing single user, single connection data intensive queries and would
like to set a higher value on windows to better use 16gb built in
ram (don't control platform, so can't jump to Linux).

Anyone found a work around?

Thanks!

Nick


Re: [PERFORM] Workaround for working_mem max value in windows?

2014-04-15 Thread Martin French
> Hi all,
> 
> A few years ago someone said postgres windows can't set working_mem 
> above about 2 GB (
www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us
> -- seems to be same for maintenance_working_mem ). Im finding 
> limit still present.

Setting work_mem higher than 2GB on a 16GB machine could easily run the 
server out of memory.

work_mem is set on a "per client" and "per sort" basis, so setting it to 
2GB would exhaust the amount of available ram very quickly on complex 
queries with multiple sorts, (or with a number of clients greater than 8 - 
although you mention that you're using a single user; that doesn't mean 
that there is only 1 connection to the database).

The same rule applies with maintenance_work_mem, more than 1 autovacuum 
would use n multiples of maintenance_work_mem, again exhausting the server 
very quickly.

> 
>  I'm doing single user, single connection data intensive queries and
> would like to set a higher value on windows to better use 16gb built
> in ram (don't control platform, so can't jump to Linux). 
> 
> Anyone found a work around?
> 

PostgreSQL on windows is maintained by EnterpriseDB IIRC, so maybe someone 
on their forums has any ideas on this, as I doubt very much that the extra 
work in the PostgreSQL core would be undertaken give the comment by Tom in 
the thread you posted.

http://forums.enterprisedb.com/forums/list.page

Cheers
=

Romax Technology Limited 
A limited company registered in England and Wales.
Registered office:
Rutherford House 
Nottingham Science and Technology Park 
Nottingham 
NG7 2PZ 
England
Registration Number: 2345696
VAT Number: 526 246 746

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=
===
E-mail: i...@romaxtech.com
Website: www.romaxtech.com
=


Confidentiality Statement
This transmission is for the addressee only and contains information that 
is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf 
of the addressee 
you may not copy or use it, or disclose it to anyone else. 
If you have received this transmission in error please delete from your 
system and contact the sender. Thank you for your cooperation.
=