[PERFORM] Using high speed swap to improve performance?
Hi there, About a year ago we setup a machine with sixteen 15k disk spindles on Solaris using ZFS. Now that Oracle has taken Sun, and is closing up Solaris, we want to move away (we are more familiar with Linux anyway). So the plan is to move to Linux and put the data on a SAN using iSCSI (two or four network interfaces). This however leaves us with with 16 very nice disks dooing nothing. Sound like a wast of time. If we were to use Solaris, ZFS would have a solution: use it as L2ARC. But there is no Linux filesystem with those features (ZFS on fuse it not really an option). So I was thinking: Why not make a big fat array using 14 disks (raid 1, 10 or 5), and make this a big and fast swap disk. Latency will be lower than the SAN can provide, and throughput will also be better, and it will relief the SAN from a lot of read iops. So I could create a 1TB swap disk, and put it onto the OS next to the 64GB of memory. Then I can set Postgres to use more than the RAM size so it will start swapping. It would appear to postgres that the complete database will fit into memory. The question is: will this do any good? And if so: what will happen? Kind regards, Christiaan
Re: [PERFORM] Using high speed swap to improve performance?
What about FreeBSD with ZFS? I have no idea which features they support and which not, but it at least is a bit more free than Solaris and still offers that very nice file system. Best regards, Arjen On 2-4-2010 21:15 Christiaan Willemsen wrote: Hi there, About a year ago we setup a machine with sixteen 15k disk spindles on Solaris using ZFS. Now that Oracle has taken Sun, and is closing up Solaris, we want to move away (we are more familiar with Linux anyway). So the plan is to move to Linux and put the data on a SAN using iSCSI (two or four network interfaces). This however leaves us with with 16 very nice disks dooing nothing. Sound like a wast of time. If we were to use Solaris, ZFS would have a solution: use it as L2ARC. But there is no Linux filesystem with those features (ZFS on fuse it not really an option). So I was thinking: Why not make a big fat array using 14 disks (raid 1, 10 or 5), and make this a big and fast swap disk. Latency will be lower than the SAN can provide, and throughput will also be better, and it will relief the SAN from a lot of read iops. So I could create a 1TB swap disk, and put it onto the OS next to the 64GB of memory. Then I can set Postgres to use more than the RAM size so it will start swapping. It would appear to postgres that the complete database will fit into memory. The question is: will this do any good? And if so: what will happen? Kind regards, Christiaan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How much memory is PostgreSQL using
Greg, Thanks for your help. 1) How does the number of buffers provided by pg_buffercache compare to memory (buffers * X = Y meg)? 2) Is there a way to tell how many total buffers I have available/max? Thanks, Lance Campbell Software Architect/DBA/Project Manager Web Services at Public Affairs 217-333-0382 -Original Message- From: Greg Smith [mailto:g...@2ndquadrant.com] Sent: Monday, March 29, 2010 11:54 AM To: Campbell, Lance Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] How much memory is PostgreSQL using Campbell, Lance wrote: Or is there some way to ask PostgreSQL how much memory are you using to cache disk blocks currently? You can install contrib/pg_buffercache into each database and count how many used blocks are there. Note that running queries using that diagnostic tool is really intensive due to the locks it takes, so be careful not to do that often on a production system. When you do a PG_DUMP does PostgreSQL put the disk blocks into shared buffers as it runs? To some extent. Most pg_dump activity involves sequential scans that are reading an entire table. Those are no different from any other process that will put disk blocks into shared_buffers. However, that usage pattern makes pg_dump particularly likely to run into an optimization in 8.3 and later that limits how much of shared_buffers is used when sequentially scanning a large table. See P10 of http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf for the exact implementation. Basically, anything bigger than shared_buffers / 4 uses a 256K ring to limit its cache use, but it's a little more complicated than that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?
Does the psql executable have any ability to do a fetch many, using a server-side named cursor, when returning results? It seems like it tries to retrieve the query entirely to local memory before printing to standard out. Specifically, I've tried running the following command on my desktop, which returns about 70 million lines: echo select [thirteen columns, all floating point and boolean types] from go_prediction_view_mouse gpv order by combined_score desc nulls last | psql -U [username] -h [remote host] [database] mouse_predictions.tab The command increases in memory usage until about 12GB, when it stops and there is significant disk activity (assume paging to disk). Closing a few big programs immediately causes it to increase its memory usage accordingly. After about 50 minutes, I killed it. If I instead run the Python program below, which simply performs the query using a fetchmany() call to retrieve a few hundred thousand tuples at a time through a named server-side cursor, the program remains under about 20 MB of memory usage throughout and finishes in about 35 minutes. I know that the query used here could have been a COPY statement, which I assume would be better-behaved, but I'm more concerned about the case in which the query is more complex. The local (OSX 10.6.2) version of Postgres is 8.4.2, and the server's (Ubuntu 4.x) version of Postgres is 8.3.5. -- Python source to use a named server-side cursor to dump a large number of rows to a file -- import psycopg2 import psycopg2.extensions #---Modify this section---# species = 'mouse' query = 'select go_term_ref, gene_remote_id, function_verified_exactly, function_verified_with_parent_go, function_verified_with_child_go, combined_score, obozinski_score, lee_score, mostafavi_score, guan_score, kim_score, joshi_score, tasan_score, tasan_revised_score, qi_score, leone_score from go_prediction_view_' + species + ' gpv order by combined_score desc nulls last' outputFilePath = [*output file path*] connectionString = [*connection string*] queryBufferSize = 1 def processRow(row): # not important #---End modify this section# #---Everything below should be genetic---# conn = psycopg2.connect(connectionString); cur = conn.cursor('temp_dump_cursor') cur.arraysize = queryBufferSize def resultIter(cursor): 'An iterator that uses fetchmany to keep memory usage down' done = False while not done: results = cursor.fetchmany(queryBufferSize) if not results or len(results) == 0: done = True for result in results: yield result with open(outputFilePath, 'w') as file: print 'Starting ' + query + '...' cur.execute(query) i = 0 for row in resultIter(cur): i += 1 row = processRow(row) file.write('\t'.join(row) + '\n') if i % queryBufferSize == 0: print 'Wrote ' + str(i) + ' lines.' print 'finished. Total of ' + str(i) + ' lines.' cur.close() conn.close() -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: *** PROBABLY SPAM *** [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?
Does the psql executable have any ability to do a fetch many, using a server-side named cursor, when returning results? It seems like it tries to retrieve the query entirely to local memory before printing to standard out. I think it grabs the whole result set to calculate the display column widths. I think there is an option to tweak this but don't remember which, have a look at the psql commands (\?), formatting section. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?
Ah, you're right. Thanks Hannu, that's a good solution. Hannu Krosing wrote: On Fri, 2010-04-02 at 16:28 -0400, Beaver, John E wrote: ... I know that the query used here could have been a COPY statement, which I assume would be better-behaved, but I'm more concerned about the case in which the query is more complex. COPY can copy out results of a SELECT query as well. -- John E. Beaver Bioinformatics Developer Harvard Medical School
Re: *** PROBABLY SPAM *** [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?
That makes sense. I'll just use a COPY statement instead like Hannu suggests. Pierre C wrote: Does the psql executable have any ability to do a "fetch many", using a server-side named cursor, when returning results? It seems like it tries to retrieve the query entirely to local memory before printing to standard out. I think it grabs the whole result set to calculate the display column widths. I think there is an option to tweak this but don't remember which, have a look at the psql commands (\?), formatting section. -- John E. Beaver Bioinformatics Developer Harvard Medical School
[PERFORM] David Crooke wants to stay in touch on LinkedIn
LinkedIn David Crooke requested to add you as a connection on LinkedIn: -- Dimi, I'd like to add you to my professional network on LinkedIn. - David Crooke Accept invitation from David Crooke http://www.linkedin.com/e/pkJZe8TYN23PC0V37kBBjDgscEaP6Wqep-btW4e1lt1E-6xG/blk/I1937892965_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnPkSej8Ve3sPej59bRtMt3tFl3dPbPARc34OdzoVcj4LrCBxbOYWrSlI/EML_comm_afe/ View invitation from David Crooke http://www.linkedin.com/e/pkJZe8TYN23PC0V37kBBjDgscEaP6Wqep-btW4e1lt1E-6xG/blk/I1937892965_2/39vdjoVczAUdPcVckALqnpPbOYWrSlI/svi/ -- DID YOU KNOW your LinkedIn profile helps you control your public image when people search for you? Setting your profile as public means your LinkedIn profile will come up when people enter your name in leading search engines. Take control of your image! http://www.linkedin.com/e/ewp/inv-22/ -- (c) 2010, LinkedIn Corporation