Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Pierre C
On Thu, 04 Nov 2010 15:42:08 +0100, Nick Matheson nick.d.mathe...@noaa.gov wrote: I think your comments really get at what our working hypothesis was, but given that our experience is limited compared to you all here on the mailing lists we really wanted to make sure we weren't missing any

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Robert Klemme
On 11/03/2010 04:52 PM, Nick Matheson wrote: We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Samuel Gendler
On Fri, Nov 5, 2010 at 12:23 PM, Samuel Gendler sgend...@ideasculptor.comwrote: On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn tiv...@gmail.comwrote: 04.11.10 16:31, Nick Matheson написав(ла): Heikki- Try COPY, ie. COPY bulk_performance.counts TO STDOUT BINARY. Thanks for the

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Samuel Gendler
On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote: 04.11.10 16:31, Nick Matheson написав(ла): Heikki- Try COPY, ie. COPY bulk_performance.counts TO STDOUT BINARY. Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s.

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Pierre C
Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close to the expected 35 MB/s doing these bulk reads? Or is this the price we have to pay for using SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and found it

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Heikki- Try COPY, ie. COPY bulk_performance.counts TO STDOUT BINARY. Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s. Are you familiar with any Java libraries for decoding the COPY format? The spec is clear and we could clearly write our

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Marti- Just some ideas that went through my mind when reading your post PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus page-level overhead and internal fragmentation. You can't do anything about row overheads, but you can recompile the server with larger pages to reduce page

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Andy- I have no idea if this would be helpful or not, never tried it, but when you fire off select * from bigtable pg will create the entire resultset in memory (and maybe swap?) and then send it all to the client in one big lump. You might try a cursor and fetch 100-1000 at a time from the

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Pierre- Reading from the tables is very fast, what bites you is that postgres has to convert the data to wire format, send it to the client, and the client has to decode it and convert it to a format usable by your application. Writing a custom aggregate in C should be a lot faster since it

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Vitalii Tymchyshyn
04.11.10 16:31, Nick Matheson написав(ла): Heikki- Try COPY, ie. COPY bulk_performance.counts TO STDOUT BINARY. Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s. Are you familiar with any Java libraries for decoding the COPY format? The

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
, Nick -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Nick Matheson Sent: Wednesday, November 03, 2010 9:53 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Simple (hopefully) throughput question? Hello

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Maciek Sakrejda
JDBC driver has some COPY support, but I don't remember details. You'd better ask in JDBC list. As long as we're here: yes, the JDBC driver has COPY support as of 8.4(?) via the CopyManager PostgreSQL-specific API. You can call ((PGConnection)conn).getCopyManager() and do either push- or

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Maciek/Vitalii- Thanks for the pointers to the JDBC work. Luckily, we had already found the COPY support in the pg driver, but were wondering if anyone had already written the complimentary unpacking code for the raw data returned from the copy. Again the spec is clear enough that we could

[PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Nick Matheson
Hello We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the storage

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Heikki Linnakangas
On 03.11.2010 17:52, Nick Matheson wrote: We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100 MB/s).

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Marti Raudsepp
Just some ideas that went through my mind when reading your post. On Wed, Nov 3, 2010 at 17:52, Nick Matheson nick.d.mathe...@noaa.gov wrote: than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the storage overhead we have observed in Postgres. In the example below,

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Andy Colson
On 11/3/2010 10:52 AM, Nick Matheson wrote: Hello We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100