[PERFORM] Using high speed swap to improve performance?

2010-04-02 Thread Christiaan Willemsen

Hi there,

 

About a year ago we setup a machine with sixteen 15k disk spindles on Solaris 
using ZFS. Now that Oracle has taken Sun, and is closing up Solaris, we want to 
move away (we are more familiar with Linux anyway).

 

So the plan is to move to Linux and put the data on a SAN using iSCSI (two or 
four network interfaces). This however leaves us with with 16 very nice disks 
dooing nothing. Sound like a wast of time. If we were to use Solaris, ZFS would 
have a solution: use it as L2ARC. But there is no Linux filesystem with those 
features (ZFS on fuse it not really an option).

 

So I was thinking: Why not make a big fat array using 14 disks (raid 1, 10 or 
5), and make this a big and fast swap disk. Latency will be lower than the SAN 
can provide, and throughput will also be better, and it will relief the SAN 
from a lot of read iops.

 

So I could create a 1TB swap disk, and put it onto the OS next to the 64GB of 
memory. Then I can set Postgres to use more than the RAM size so it will start 
swapping. It would appear to postgres that the complete database will fit into 
memory. The question is: will this do any good? And if so: what will happen?

 

Kind regards,

 

Christiaan


Re: [PERFORM] Using high speed swap to improve performance?

2010-04-02 Thread Arjen van der Meijden
What about FreeBSD with ZFS? I have no idea which features they support 
and which not, but it at least is a bit more free than Solaris and still 
offers that very nice file system.


Best regards,

Arjen

On 2-4-2010 21:15 Christiaan Willemsen wrote:

Hi there,

About a year ago we setup a machine with sixteen 15k disk spindles on
Solaris using ZFS. Now that Oracle has taken Sun, and is closing up
Solaris, we want to move away (we are more familiar with Linux anyway).

So the plan is to move to Linux and put the data on a SAN using iSCSI
(two or four network interfaces). This however leaves us with with 16
very nice disks dooing nothing. Sound like a wast of time. If we were to
use Solaris, ZFS would have a solution: use it as L2ARC. But there is no
Linux filesystem with those features (ZFS on fuse it not really an option).

So I was thinking: Why not make a big fat array using 14 disks (raid 1,
10 or 5), and make this a big and fast swap disk. Latency will be lower
than the SAN can provide, and throughput will also be better, and it
will relief the SAN from a lot of read iops.

So I could create a 1TB swap disk, and put it onto the OS next to the
64GB of memory. Then I can set Postgres to use more than the RAM size so
it will start swapping. It would appear to postgres that the complete
database will fit into memory. The question is: will this do any good?
And if so: what will happen?

Kind regards,

Christiaan



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How much memory is PostgreSQL using

2010-04-02 Thread Campbell, Lance
Greg,
Thanks for your help.

1) How does the number of buffers provided by pg_buffercache compare to
memory (buffers * X = Y meg)?  
2) Is there a way to tell how many total buffers I have available/max?

Thanks,

Lance Campbell
Software Architect/DBA/Project Manager
Web Services at Public Affairs
217-333-0382


-Original Message-
From: Greg Smith [mailto:g...@2ndquadrant.com] 
Sent: Monday, March 29, 2010 11:54 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How much memory is PostgreSQL using

Campbell, Lance wrote:

 Or is there some way to ask PostgreSQL how much memory are you using 
 to cache disk blocks currently?


You can install contrib/pg_buffercache into each database and count how 
many used blocks are there.  Note that running queries using that 
diagnostic tool is really intensive due to the locks it takes, so be 
careful not to do that often on a production system.


 When you do a PG_DUMP does PostgreSQL put the disk blocks into shared 
 buffers as it runs?


To some extent.  Most pg_dump activity involves sequential scans that 
are reading an entire table.  Those are no different from any other 
process that will put disk blocks into shared_buffers.  However, that 
usage pattern makes pg_dump particularly likely to run into an 
optimization in 8.3 and later that limits how much of shared_buffers is 
used when sequentially scanning a large table.  See P10 of 
http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf 
for the exact implementation.  Basically, anything bigger than 
shared_buffers / 4 uses a 256K ring to limit its cache use, but it's a 
little more complicated than that.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?

2010-04-02 Thread Beaver, John E
Does the psql executable have any ability to do a fetch many, using a 
server-side named cursor, when returning results? It seems like it tries to 
retrieve the query entirely to local memory before printing to standard out.

Specifically, I've tried running the following command on my desktop, which 
returns about 70 million lines:
echo select [thirteen columns, all floating point and boolean types] from 
go_prediction_view_mouse gpv order by combined_score desc nulls last | psql -U 
[username] -h [remote host] [database]  mouse_predictions.tab

The command increases in memory usage until about 12GB, when it stops and there 
is significant disk activity (assume paging to disk). Closing a few big 
programs immediately causes it to increase its memory usage accordingly. After 
about 50 minutes, I killed it.

If I instead run the Python program below, which simply performs the query 
using a fetchmany() call to retrieve a few hundred thousand tuples at a time 
through a named server-side cursor, the program remains under about 20 MB of 
memory usage throughout and finishes in about 35 minutes.

I know that the query used here could have been a COPY statement, which I 
assume would be better-behaved, but I'm more concerned about the case in which 
the query is more complex.

The local (OSX 10.6.2) version of Postgres is 8.4.2, and the server's (Ubuntu 
4.x) version of Postgres is 8.3.5.


-- Python source to use a named server-side cursor to dump a large number 
of rows to a file --

import psycopg2
import psycopg2.extensions

#---Modify this section---#
species = 'mouse'
query = 'select go_term_ref, gene_remote_id, function_verified_exactly, 
function_verified_with_parent_go, function_verified_with_child_go, 
combined_score, obozinski_score, lee_score, mostafavi_score, guan_score, 
kim_score, joshi_score, tasan_score, tasan_revised_score, qi_score, leone_score 
from go_prediction_view_' + species + ' gpv order by combined_score desc nulls 
last'
outputFilePath = [*output file path*]
connectionString = [*connection string*]
queryBufferSize = 1

def processRow(row):
# not important
#---End modify this section#


#---Everything below should be genetic---#
conn = psycopg2.connect(connectionString);
cur = conn.cursor('temp_dump_cursor')
cur.arraysize = queryBufferSize

def resultIter(cursor):
'An iterator that uses fetchmany to keep memory usage down'
done = False
while not done:
results = cursor.fetchmany(queryBufferSize)
if not results or len(results) == 0:
done = True
for result in results:
yield result

with open(outputFilePath, 'w') as file:
print 'Starting ' + query + '...'
cur.execute(query)
i = 0
for row in resultIter(cur):
i += 1
row = processRow(row)
file.write('\t'.join(row) + '\n')

if i % queryBufferSize == 0:
print 'Wrote ' + str(i) + ' lines.'

print 'finished. Total of ' + str(i) + ' lines.'
cur.close()
conn.close()
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: *** PROBABLY SPAM *** [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?

2010-04-02 Thread Pierre C
Does the psql executable have any ability to do a fetch many, using a  
server-side named cursor, when returning results? It seems like it tries  
to retrieve the query entirely to local memory before printing to  
standard out.


I think it grabs the whole result set to calculate the display column  
widths. I think there is an option to tweak this but don't remember which,  
have a look at the psql commands (\?), formatting section.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?

2010-04-02 Thread John Beaver




Ah, you're right. Thanks Hannu, that's a good solution.

Hannu Krosing wrote:

  On Fri, 2010-04-02 at 16:28 -0400, Beaver, John E wrote:
...

  
  
I know that the query used here could have been a COPY statement, which I assume would
 be better-behaved, but I'm more concerned about the case in which the query is more complex.

  
  
COPY can copy out results of a SELECT query as well.

  


-- 
John E. Beaver
Bioinformatics Developer
Harvard Medical School




Re: *** PROBABLY SPAM *** [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?

2010-04-02 Thread John Beaver




That makes sense. I'll just use a COPY statement instead like Hannu
suggests.

Pierre C wrote:

  
Does the psql executable have any ability to do a "fetch many", using a  
server-side named cursor, when returning results? It seems like it tries  
to retrieve the query entirely to local memory before printing to  
standard out.

  
  
I think it grabs the whole result set to calculate the display column  
widths. I think there is an option to tweak this but don't remember which,  
have a look at the psql commands (\?), formatting section.
  


-- 
John E. Beaver
Bioinformatics Developer
Harvard Medical School




[PERFORM] David Crooke wants to stay in touch on LinkedIn

2010-04-02 Thread David Crooke
LinkedIn
David Crooke requested to add you as a connection on LinkedIn:
--

Dimi,

I'd like to add you to my professional network on LinkedIn.

- David Crooke

Accept invitation from David Crooke
http://www.linkedin.com/e/pkJZe8TYN23PC0V37kBBjDgscEaP6Wqep-btW4e1lt1E-6xG/blk/I1937892965_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPkSej8Ve3sPej59bRtMt3tFl3dPbPARc34OdzoVcj4LrCBxbOYWrSlI/EML_comm_afe/

View invitation from David Crooke
http://www.linkedin.com/e/pkJZe8TYN23PC0V37kBBjDgscEaP6Wqep-btW4e1lt1E-6xG/blk/I1937892965_2/39vdjoVczAUdPcVckALqnpPbOYWrSlI/svi/

-- 
DID YOU KNOW your LinkedIn profile helps you control your public image when 
people search for you? Setting your profile as public means your LinkedIn 
profile will come up when people enter your name in leading search engines. 
Take control of your image! 
http://www.linkedin.com/e/ewp/inv-22/

 
--
(c) 2010, LinkedIn Corporation