Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Marti Raudsepp
On Tue, Sep 13, 2011 at 00:26, Robert Klemme shortcut...@googlemail.com wrote: In the case of PG this particular example will work: 1. TX inserts new PK row 2. TX tries to insert same PK row = blocks 1. TX commits 2. TX fails with PK violation 2. TX does the update (if the error is caught)

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Robert Klemme
On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp ma...@juffo.org wrote: On Tue, Sep 13, 2011 at 00:26, Robert Klemme shortcut...@googlemail.com wrote: In the case of PG this particular example will work: 1. TX inserts new PK row 2. TX tries to insert same PK row = blocks 1. TX commits 2. TX

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Marc Mamin
- Von: pgsql-performance-ow...@postgresql.org im Auftrag von Robert Klemme Gesendet: Di 9/13/2011 6:34 An: Marti Raudsepp Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Postgres for a data warehouse, 5-10 TB On Tue, Sep 13, 2011 at 5:13 PM, Marti Raudsepp ma...@juffo.org wrote

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Marti Raudsepp
On Tue, Sep 13, 2011 at 19:34, Robert Klemme shortcut...@googlemail.com wrote: I don't think so.  You only need to catch the error (see attachment). Or does this create a sub transaction? Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Igor Chudov
I do not need to do insert updates from many threads. I want to do it from one thread. My current MySQL architecture is that I have a table with same layout as the main one, to hold new and updated objects. When there is enough objects, I begin a big INSERT SELECT ... ON DUPLICATE KEY UPDATE and

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Stefan Keller
Interesting debate. 2011/9/13 Marti Raudsepp ma...@juffo.org: Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case of an error. Are you sure? In theory I always understood that there are no subtransactions. In fact when looking at the docs

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Scott Marlowe
On Tue, Sep 13, 2011 at 12:57 PM, Stefan Keller sfkel...@gmail.com wrote: Are you sure? In theory I always understood that there are no subtransactions. subtransaction is just another way of saying save points / rollback. -- Sent via pgsql-performance mailing list

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Robert Klemme
On 13.09.2011 20:11, Marti Raudsepp wrote: On Tue, Sep 13, 2011 at 19:34, Robert Klemmeshortcut...@googlemail.com wrote: I don't think so. You only need to catch the error (see attachment). Or does this create a sub transaction? Yes, every BEGIN/EXCEPTION block creates a subtransaction --

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Robert Klemme
On 13.09.2011 20:57, Stefan Keller wrote: Interesting debate. Indeed. 2011/9/13 Marti Raudseppma...@juffo.org: Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case of an error. Are you sure? In theory I always understood that there are

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Kevin Grittner
Robert Klemme wrote: On 12.09.2011 19:22, Andy Colson wrote: There are transaction isolation levels, but they are like playing with fire. (in my opinion). You make them sound like witchcraft. But they are clearly defined - even standardized. Yeah, for decades. Developing concurrency

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Gianni Ciolli
On Mon, Sep 12, 2011 at 11:26:10PM +0200, Robert Klemme wrote: You make them sound like witchcraft. But they are clearly defined - even standardized. Granted, different RDBMS might implement them in different ways - here's PG's view of TX isolation:

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Shaun Thomas
On 09/11/2011 09:44 AM, Claudio Freire wrote: And Andy is right, you'll have a lot less space. If raid 10 doesn't give you enough room, just leave two spare drives for a raid 0 temporary partition. That will be at least twice as fast as doing temporary tables on the raid 6. Alternatively,

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Shaun Thomas
On 09/11/2011 12:02 PM, Marti Raudsepp wrote: Which brings me to another important point: don't do lots of small write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides being inefficient, they introduce a big maintenance burden. I'd like to second this. Before a notable

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Shaun Thomas
On 09/11/2011 09:27 AM, Claudio Freire wrote: I have used slony to do database migration. It is a pain to set up, but it saves you hours of downtime. I've had to shoot this option down in two separate upgrade scenarios in two different companies. Theoretically it's possible, but slony is

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Robert Klemme
On 11.09.2011 22:10, Scott Marlowe wrote: Another data point. We had a big Oracle installation at my last job, and OLAP queries were killing it midday, so I built a simple replication system to grab rows from the big iron Oracle SUN box and shove into a single core P IV 2.xGHz machine with 4

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Robert Klemme
On 11.09.2011 19:02, Marti Raudsepp wrote: On Sun, Sep 11, 2011 at 17:23, Andy Colsona...@squeakycode.net wrote: On 09/11/2011 08:59 AM, Igor Chudov wrote: By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? You have two options: 1) write a function

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Andy Colson
On 9/12/2011 12:15 PM, Robert Klemme wrote: On 11.09.2011 19:02, Marti Raudsepp wrote: On Sun, Sep 11, 2011 at 17:23, Andy Colsona...@squeakycode.net wrote: On 09/11/2011 08:59 AM, Igor Chudov wrote: By the way, does that INSERT UPDATE functionality or something like this exist in Postgres?

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Scott Marlowe
On Mon, Sep 12, 2011 at 11:04 AM, Robert Klemme shortcut...@googlemail.com wrote: On 11.09.2011 22:10, Scott Marlowe wrote: Another data point.  We had a big Oracle installation at my last job, and OLAP queries were killing it midday, so I built a simple replication system to grab rows from

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Shaun Thomas
On 09/12/2011 02:48 PM, Scott Marlowe wrote: I put it to you that your hardware has problems if you have a pg db that's corrupting from having too much vacuum activity. What? No. We optimized by basically forcing autovacuum to never run during our active periods. We never actually

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Scott Marlowe
On Mon, Sep 12, 2011 at 10:22 AM, Shaun Thomas stho...@peak6.com wrote: On 09/11/2011 12:02 PM, Marti Raudsepp wrote: Which brings me to another important point: don't do lots of small write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides being inefficient, they introduce a big

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Marti Raudsepp
On Mon, Sep 12, 2011 at 23:04, Shaun Thomas stho...@peak6.com wrote: I was alluding to the fact that if a DBA had his system running for a week at our transaction level, and PG didn't have forced auto vacuum, and their maintenance lapsed even slightly, they could end up with a corrupt database.

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Scott Marlowe
On Mon, Sep 12, 2011 at 2:04 PM, Shaun Thomas stho...@peak6.com wrote: On 09/12/2011 02:48 PM, Scott Marlowe wrote: I put it to you that your hardware has problems if you have a pg db that's corrupting from having too much vacuum activity. What? No. We optimized by basically forcing

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Shaun Thomas
On 09/12/2011 03:44 PM, Scott Marlowe wrote: The PostgreSQL team works REALLY hard to prevent any kind of corruption scenario from rearing its ugly head, so when the word corruption pops up I start to wonder about the system (hardware wise) someone is using, You've apparently never used

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Scott Marlowe
On Mon, Sep 12, 2011 at 2:55 PM, Shaun Thomas stho...@peak6.com wrote: On 09/12/2011 03:44 PM, Scott Marlowe wrote: The PostgreSQL team works REALLY hard to prevent any kind of corruption scenario from rearing its ugly head, so when the word corruption pops up I start to wonder about the

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Robert Klemme
On 12.09.2011 19:22, Andy Colson wrote: On 9/12/2011 12:15 PM, Robert Klemme wrote: On 11.09.2011 19:02, Marti Raudsepp wrote: On Sun, Sep 11, 2011 at 17:23, Andy Colsona...@squeakycode.net wrote: On 09/11/2011 08:59 AM, Igor Chudov wrote: By the way, does that INSERT UPDATE functionality or

[PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Igor Chudov
I have been a MySQL user for years, including owning a few multi-gigabyte databases for my websites, and using it to host algebra.com (about 12 GB database). I have had my ups and downs with MySQL. The ups were ease of use and decent performance for small databases such as algebra.com. The downs

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Scott Marlowe
On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast cores is plenty for what you're doing. But the drive array and how it's configured etc are very important. There's a huge

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread pasman pasmański
For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). 2011/9/11, Scott Marlowe scott.marl...@gmail.com: On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Igor Chudov
On Sun, Sep 11, 2011 at 7:52 AM, Scott Marlowe scott.marl...@gmail.comwrote: On Sun, Sep 11, 2011 at 6:35 AM, Igor Chudov ichu...@gmail.com wrote: I have a server with about 18 TB of storage and 48 GB of RAM, and 12 CPU cores. 1 or 2 fast cores is plenty for what you're doing. I need

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Igor Chudov
2011/9/11 pasman pasmański pasma...@gmail.com For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). I have 6 Gb/s disk drives, so it should be not too far, maybe 5 hours for a seqscan. i 2011/9/11, Scott Marlowe scott.marl...@gmail.com: On Sun, Sep 11, 2011 at

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Claudio Freire
On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I would say that I am not very concerned with linear relationship of read speed to disk

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Andy Colson
On 09/11/2011 07:35 AM, Igor Chudov wrote: I have been a MySQL user for years, including owning a few multi-gigabyte databases for my websites, and using it to host algebra.com http://algebra.com (about 12 GB database). I have had my ups and downs with MySQL. The ups were ease of use and decent

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Igor Chudov
On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire klaussfre...@gmail.comwrote: On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6 configuration. They are managed by a 3WARE 9750 RAID CARD. I

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Andy Colson
On 09/11/2011 08:59 AM, Igor Chudov wrote: I do not plan to do a lot of random writing. My current design is that my perl scripts write to a temporary table every week, and then I do INSERT..ON DUPLICATE KEY UPDATE. By the way, does that INSERT UPDATE functionality or something like this

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Claudio Freire
On Sun, Sep 11, 2011 at 4:16 PM, Andy Colson a...@squeakycode.net wrote: Upgrading to major versions of PG may or may not be painful.  (mysql sometimes works seamlessly between versions, it appears brilliant.  But I have had problems with an update, and when it goes bad, you dont have a lot of

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Andy Colson
On 09/11/2011 09:21 AM, Igor Chudov wrote: On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire klaussfre...@gmail.com mailto:klaussfre...@gmail.com wrote: On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com mailto:ichu...@gmail.com wrote: Well, right now, my server has

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Claudio Freire
On Sun, Sep 11, 2011 at 4:21 PM, Igor Chudov ichu...@gmail.com wrote: Quantitatively, what would you say is the write speed difference between RAID 10 and RAID 6? https://support.nstein.com/blog/archives/73 There you can see a comparison with 4 drives, and raid 10 is twice as fast. Since raid

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Marti Raudsepp
On Sun, Sep 11, 2011 at 17:23, Andy Colson a...@squeakycode.net wrote: On 09/11/2011 08:59 AM, Igor Chudov wrote: By the way, does that INSERT UPDATE functionality or something like this exist in Postgres? You have two options: 1) write a function like: create function doinsert(_id integer,

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Ogden
On Sep 11, 2011, at 9:21 AM, Igor Chudov wrote: On Sun, Sep 11, 2011 at 9:16 AM, Claudio Freire klaussfre...@gmail.com wrote: On Sun, Sep 11, 2011 at 3:59 PM, Igor Chudov ichu...@gmail.com wrote: Well, right now, my server has twelve 7,200 RPM 2TB hard drives in a RAID-6

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread J Sisson
On Sun, Sep 11, 2011 at 1:36 PM, Ogden li...@darkstatic.com wrote: As someone who migrated a RAID 5 installation to RAID 10, I am getting far better read and write performance on heavy calculation queries. Writing on the RAID 5 really made things crawl. For lots of writing, I think RAID 10 is

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread J Sisson
Sorry, meant to send this to the list. For really big data-warehousing, this document really helped us: http://pgexperts.com/document.html?id=49

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Scott Marlowe
2011/9/11 pasman pasmański pasma...@gmail.com: For 10 TB table and 3hours, disks should have a transfer about 1GB/s (seqscan). Random data point. Our primary servers were built for OLTP with 48 cores and 32 15kSAS drives. We started out on Arecas but the Supermicro 1Us we were using didn't

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Stephen Frost
* Igor Chudov (ichu...@gmail.com) wrote: Right now I have a personal (one user) project to create a 5-10 Terabyte data warehouse. The largest table will consume the most space and will take, perhaps, 200,000,000 rows. I run data-warehouse databases on that order (current largest single

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Igor Chudov
On Sun, Sep 11, 2011 at 6:01 PM, Stephen Frost sfr...@snowman.net wrote: * Igor Chudov (ichu...@gmail.com) wrote: Right now I have a personal (one user) project to create a 5-10 Terabyte data warehouse. The largest table will consume the most space and will take, perhaps, 200,000,000 rows.

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Claudio Freire
On Mon, Sep 12, 2011 at 1:16 AM, Igor Chudov ichu...@gmail.com wrote: I could, say, set work_mem to 30 GB? (64 bit linux) I don't think you'd want that. Remember, work_mem is the amount of memory *per sort*. Queries can request several times that much memory, once per sort they need to perform.

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Stephen Frost
* Igor Chudov (ichu...@gmail.com) wrote: Can I partition data by month (or quarter), without that month being part of PRIMARY KEY? The way partitioning works in PG is by using CHECK constraints. Not sure if you're familiar with those (not sure if MySQL has them), so here's a quick example:

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote: I don't think you'd want that. Remember, work_mem is the amount of memory *per sort*. Queries can request several times that much memory, once per sort they need to perform. You can set it really high, but not 60% of your RAM - that wouldn't

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Ondrej Ivanič
Hi, On 12 September 2011 12:28, Stephen Frost sfr...@snowman.net wrote: Once those are done, you can query against the 'parent' table with something like: select * from parent where date = '2010-01-01'; And PG will realize it only has to look at table2 to get the results for that query.