[PERFORM] Slow BLOBs restoring
I have encountered a problem while restoring the database. There is a table that contains XML data (BLOB), ~ 3 000 000 records, ~ 5.5Gb of data. pg_restore has been running for a week without any considerable progress. There are plenty of lines like these in the log: pg_restore: processing item 3125397 BLOB 10001967 pg_restore: executing BLOB 10001967 CPU usage is 100% always. The total database size is about 100 Gb and it restores in an hour or so without BLOBs. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
Robert Haas robertmh...@gmail.com writes: I wonder if it would be possible to have a very short critical section where we grab the partition lock, acquire the heavyweight lock, and release the partition lock; and then only as a second step record (in the form of a PROCLOCK) the fact that we got it. [ confused... ] Exactly what do you suppose acquire the lock would be represented as, if not create a PROCLOCK entry attached to it? In any case, I think this is another example of not understanding where the costs really are. As far as I can tell, on modern MP systems much of the elapsed time in these operations comes from acquiring exclusive access to shared-memory cache lines. Reducing the number of changes you have to make within a small area of shared memory won't save much, once you've paid for the first one. Changing structures that aren't heavily contended (such as a proc's list of its own locks) doesn't cost much at all. One thing that might be interesting, but that I don't know how to attack in a reasonably machine-independent way, is to try to ensure that shared and local data structures don't accidentally overlap within cache lines. When they do, you pay for fighting the cache line away from another processor even when there's no real need. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
Hi Tom I suspect I may be missing something here, but I think it's a pretty universal truism that cache lines are aligned to power-of-2 memory addresses, so it would suffice to ensure during setup that the lower order n bits of the object address are all zeros for each critical object; if the malloc() routine being used doesn't support that, it could be done by allocating a slightly larger than necessary block of memory and choosing a location within that. The value of n could be architecture dependent, but n=8 would cover everyone, hopefully without wasting too much RAM. Cheers Dave On Tue, Dec 7, 2010 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: One thing that might be interesting, but that I don't know how to attack in a reasonably machine-independent way, is to try to ensure that shared and local data structures don't accidentally overlap within cache lines. When they do, you pay for fighting the cache line away from another processor even when there's no real need. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
We are in the process of deciding on how to proceed on a database upgrade. We currently have MS SQL 2000 running on Windows 2003 (on my test server). I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license). I started comparing DB’s and came across postgresql. It seemed to be exactly what I was after. All of our programming is in ASP.net. Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern. I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are running on Windows 2003. What I found was quite surprising and I am wondering if anyone can point out what is going on here. Here is the test I ran. I created 2 tables, the main table had 5 fields with a serial ID field. The second table linked to table 1 for a state field. I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete. Next I had ASP.net via Postgresql create 1,000 records. Took .65625 seconds. Postgresql smoked MS SQL server on that test. Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out. MS SQL took 0.76 seconds to display select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid Then I did the same test via Postgresql and it took 8.85 seconds! I tried it again as I thought I did something wrong. I did a few tweaks such as increasing the shared buffers. Still the best I could get it to was 7.5 seconds. This is insanely slow compared to MSSQL 2000. What am I missing. Here is my SQL statement for postgresql: select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid Any ideas on why the Postgres server is s much slower on the joins? I am trying to understand what is going on here so please don’t flame me. Any advice is appreciated. *Thanks, Tom Polak Rockford Area Association of Realtors ** The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you.* *Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results.* image001.jpg
Re: [PERFORM] Performance under contention
On 7 December 2010 18:37, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 6, 2010 at 9:59 PM, Jignesh Shah jks...@gmail.com wrote: That's exactly what I concluded when I was doing the sysbench simple read-only test. I had also tried with different lock partitions and it did not help since they all go after the same table. I think one way to kind of avoid the problem on the same table is to do more granular locking (Maybe at page level instead of table level). But then I dont really understand on how to even create a prototype related to this one. If you can help create a prototype then I can test it out with my setup and see if it helps us to catch up with other guys out there. We're trying to lock the table against a concurrent DROP or schema change, so locking only part of it doesn't really work. I don't really see any way to avoid needing some kind of a lock here; the trick is how to take it quickly. The main obstacle to making this faster is that the deadlock detector needs to be able to obtain enough information to break cycles, which means we've got to record in shared memory not only the locks that are granted but who has them. I'm not very familiar with PostgreSQL code but if we're brainstorming... if you're only trying to protect against a small number of expensive operations (like DROP, etc.) that don't really happen often, wouldn't an atomic reference counter be good enough for the purpose (e.g. the expensive operations would spin-wait until the counter is 0)? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
On Tue, Dec 7, 2010 at 12:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I wonder if it would be possible to have a very short critical section where we grab the partition lock, acquire the heavyweight lock, and release the partition lock; and then only as a second step record (in the form of a PROCLOCK) the fact that we got it. [ confused... ] Exactly what do you suppose acquire the lock would be represented as, if not create a PROCLOCK entry attached to it? Update the granted array and, if necessary, the grantMask. In any case, I think this is another example of not understanding where the costs really are. Possible. As far as I can tell, on modern MP systems much of the elapsed time in these operations comes from acquiring exclusive access to shared-memory cache lines. Reducing the number of changes you have to make within a small area of shared memory won't save much, once you've paid for the first one. Seems reasonable. Changing structures that aren't heavily contended (such as a proc's list of its own locks) doesn't cost much at all. I'm not sure where you're getting the idea that a proc's list of its own locks isn't heavily contended. That could be true, but it isn't obvious to me. We allocate PROCLOCK structures out of a shared hash table while holding the lock manager partition lock, and we add every lock to a queue associated with the PROC and a second queue associated with the LOCK. So if two processes acquire an AccessShareLock on the same table, both the LOCK object and at least the SHM_QUEUE portions of each PROCLOCK are shared, and those aren't necessarily nearby in memory. One thing that might be interesting, but that I don't know how to attack in a reasonably machine-independent way, is to try to ensure that shared and local data structures don't accidentally overlap within cache lines. When they do, you pay for fighting the cache line away from another processor even when there's no real need. I'd be sort of surprised if this is a problem - as I understand it, cache lines are small, contiguous chunks, and surely the heap and the shared memory segment are mapped into different portions of the address space... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras ivo...@freebsd.org wrote: On 7 December 2010 18:37, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 6, 2010 at 9:59 PM, Jignesh Shah jks...@gmail.com wrote: That's exactly what I concluded when I was doing the sysbench simple read-only test. I had also tried with different lock partitions and it did not help since they all go after the same table. I think one way to kind of avoid the problem on the same table is to do more granular locking (Maybe at page level instead of table level). But then I dont really understand on how to even create a prototype related to this one. If you can help create a prototype then I can test it out with my setup and see if it helps us to catch up with other guys out there. We're trying to lock the table against a concurrent DROP or schema change, so locking only part of it doesn't really work. I don't really see any way to avoid needing some kind of a lock here; the trick is how to take it quickly. The main obstacle to making this faster is that the deadlock detector needs to be able to obtain enough information to break cycles, which means we've got to record in shared memory not only the locks that are granted but who has them. I'm not very familiar with PostgreSQL code but if we're brainstorming... if you're only trying to protect against a small number of expensive operations (like DROP, etc.) that don't really happen often, wouldn't an atomic reference counter be good enough for the purpose (e.g. the expensive operations would spin-wait until the counter is 0)? No, because (1) busy-waiting is only suitable for locks that will only be held for a short time, and an AccessShareLock on a table might be held while we read 10GB of data in from disk, and (2) that wouldn't allow for deadlock detection. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
Tom Polak t...@rockfordarearealtors.org wrote: the best I could get it to was 7.5 seconds. select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid We'd need a lot more information. Please read this and post again: http://wiki.postgresql.org/wiki/SlowQueryQuestions Be sure to include hardware info, postgresql.conf settings (excluding comments), table layouts including indexes and constraints, and the results of: EXPLAIN ANALYZE select ... -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 12/7/2010 11:34 AM, Tom Polak wrote: We are in the process of deciding on how to proceed on a database upgrade. We currently have MS SQL 2000 running on Windows 2003 (on my test server). I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license). I started comparing DB’s and came across postgresql. It seemed to be exactly what I was after. All of our programming is in ASP.net. Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern. I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are running on Windows 2003. What I found was quite surprising and I am wondering if anyone can point out what is going on here. Here is the test I ran. I created 2 tables, the main table had 5 fields with a serial ID field. The second table linked to table 1 for a state field. I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete. Next I had ASP.net via Postgresql create 1,000 records. Took .65625 seconds. Postgresql smoked MS SQL server on that test. did you play with the postgresql.conf file? Maybe turn off fsync? I'd guess the above is mssql is flushing to disk while PG isnt. Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out. MS SQL took 0.76 seconds to display select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid Then I did the same test via Postgresql and it took 8.85 seconds! I tried it again as I thought I did something wrong. I did a few tweaks such as increasing the shared buffers. Still the best I could get it to was 7.5 seconds. This is insanely slow compared to MSSQL 2000. What am I missing. Here is my SQL statement for postgresql: select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid Any ideas on why the Postgres server is s much slower on the joins? I am trying to understand what is going on here so please don’t flame me. Any advice is appreciated. Did you create an index? That'd be my first guess. Also, can you run the sql from the command line client (psql) and see if it takes that long? While your in psql, stick a 'explain analyze' infront of your query, and let's see its output. Also, as a fair warning: mssql doesn't really care about transactions, but PG really does. Make sure all your code is properly starting and commiting transactions. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On Tuesday 07 December 2010 18:34:25 Tom Polak wrote: Then I did the same test via Postgresql and it took 8.85 seconds! I tried it again as I thought I did something wrong. I did a few tweaks such as increasing the shared buffers. Still the best I could get it to was 7.5 seconds. This is insanely slow compared to MSSQL 2000. What am I missing. Here is my SQL statement for postgresql: select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid I think you would at least provide the exact schema and possibly some example data (pg_dump) to get us somewhere. I would suggest you post the output of EXPLAIN ANALYZE $yourquery - that gives us information about how that query was executed. Greetings, Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
On 7 December 2010 19:10, Robert Haas robertmh...@gmail.com wrote: I'm not very familiar with PostgreSQL code but if we're brainstorming... if you're only trying to protect against a small number of expensive operations (like DROP, etc.) that don't really happen often, wouldn't an atomic reference counter be good enough for the purpose (e.g. the expensive operations would spin-wait until the counter is 0)? No, because (1) busy-waiting is only suitable for locks that will only be held for a short time, and an AccessShareLock on a table might be held while we read 10GB of data in from disk, Generally yes, but a variant with adaptive sleeping could possibly be used if it would be acceptable to delay (uncertainly) the already expensive and rare operations. and (2) that wouldn't allow for deadlock detection. Probably :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 12/7/10 9:34 AM, Tom Polak wrote: We are in the process of deciding on how to proceed on a database upgrade. We currently have MS SQL 2000 running on Windows 2003 (on my test server). I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license). I started comparing DB’s and came across postgresql. It seemed to be exactly what I was after. All of our programming is in ASP.net. Since I am running MSSQL 2000 I have no benefit for .Net integration, so it is not a concern. I ran a head to head test of MS SQL 2000 and Postgresql 9.0. Both are running on Windows 2003. What I found was quite surprising and I am wondering if anyone can point out what is going on here. Here is the test I ran. I created 2 tables, the main table had 5 fields with a serial ID field. The second table linked to table 1 for a state field. Did you run ANALYZE on the database after creating it and loading the data? If not, do it and try again (along with the other suggestions you'll get here). ANALYZE gathers the statistics that allow the planner to do its job. Without statistics, all bets are off. Craig I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 seconds to complete. Next I had ASP.net via Postgresql create 1,000 records. Took .65625 seconds. Postgresql smoked MS SQL server on that test. Next test is to use ASP.net and join all 1,000 rows with table 2 and then display the text out. MS SQL took 0.76 seconds to display select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid Then I did the same test via Postgresql and it took 8.85 seconds! I tried it again as I thought I did something wrong. I did a few tweaks such as increasing the shared buffers. Still the best I could get it to was 7.5 seconds. This is insanely slow compared to MSSQL 2000. What am I missing. Here is my SQL statement for postgresql: select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid Any ideas on why the Postgres server is s much slower on the joins? I am trying to understand what is going on here so please don’t flame me. Any advice is appreciated. *Thanks, Tom Polak Rockford Area Association of Realtors */ The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you./ /Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results./ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 12/7/2010 1:22 PM, Justin Pitts wrote: Also, as a fair warning: mssql doesn't really care about transactions, but PG really does. Make sure all your code is properly starting and commiting transactions. -Andy I do not understand that statement. Can you explain it a bit better? In mssql you can write code that connects to the db, fire off updates and inserts, and then disconnects. I believe mssql will keep all your changes, and the transaction stuff is done for you. In PG the first statement you fire off (like an insert into for example) will start a transaction. If you dont commit before you disconnect that transaction will be rolled back. Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. There are differences in the way mssql and pg do transactions. mssql uses a transaction log and keeps current data in the table. In mssql if you open a transaction and write a bunch of stuff, the table contains that new stuff. Everyone can see it. (I think default transaction isolation level is read commited). But if you set your isolation level to something with repeatable read, then your program will block and have to wait on every little change to the table. (or, probably page.. I think mssql has page level locking?) anyway, in PG, multiple versions of the same row are kept, and when you open, and keep open a transaction, PG has to keep a version of the row for every change that other people make. So a long lasting transaction could create hundreds of versions of one row. Then when somebody goes to select against that table, it has to scan not only the rows, but every version of every row! So my point is, in PG, use transactions as they were meant to be used, as single atomic operations. Start, do some work, commit. mssql made it easy to ignore transactions by doing it for you. Ignoring transaction in PG will hurt you. you can google MVCC and postgres idle in transaction for more. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson a...@squeakycode.net wrote: In PG the first statement you fire off (like an insert into for example) will start a transaction. If you dont commit before you disconnect that transaction will be rolled back. Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. Huh - is this new? I always thought that every statement was wrapped in its own transaction unless you explicitly start your own. So you shouldn't need to commit before closing a connection if you never opened a transaction to begin with. -- Regards, Richard Broersma Jr. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson a...@squeakycode.net wrote: In PG the first statement you fire off (like an insert into for example) will start a transaction. ?If you dont commit before you disconnect that transaction will be rolled back. ?Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. Huh - is this new? I always thought that every statement was wrapped in its own transaction unless you explicitly start your own. So you shouldn't need to commit before closing a connection if you never opened a transaction to begin with. -- Regards, Richard Broersma Jr. The default of autocommit unless explicitly starting a transaction with BEGIN is the normal behavior that I have seen as well. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 07/12/2010 7:43 PM, Andy Colson wrote: On 12/7/2010 1:22 PM, Justin Pitts wrote: Also, as a fair warning: mssql doesn't really care about transactions, but PG really does. Make sure all your code is properly starting and commiting transactions. -Andy I do not understand that statement. Can you explain it a bit better? In mssql you can write code that connects to the db, fire off updates and inserts, and then disconnects. I believe mssql will keep all your changes, and the transaction stuff is done for you. In PG the first statement you fire off (like an insert into for example) will start a transaction. If you dont commit before you disconnect that transaction will be rolled back. Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. As far as I know both MS SQL and and Postgres work just the same as regards explicit and implicit (autocommit) transactions, only the underlying storage/logging mechanisms are different. Transactions shouldn't make ay real difference to the select/join performance being complained about though. It's already stated that the insert performance of postgres far exceeds SQL Server, which is my experience also. As already suggested, until we see the exact table definitions including indexes etc. there's no real way to tell what the problem is. How many rows are in the second table? It really shouldn't take that much time to read 1000 rows unless you have a bizarrely slow hard disk. It would be nice to eliminate any programmatic or driver influence too. How does the SQL select execute in enterprise manager for mssql and psql or pgadmin for postgres? Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 12/7/2010 2:10 PM, Kenneth Marshall wrote: On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: On Tue, Dec 7, 2010 at 11:43 AM, Andy Colsona...@squeakycode.net wrote: In PG the first statement you fire off (like an insert into for example) will start a transaction. ?If you dont commit before you disconnect that transaction will be rolled back. ?Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. Huh - is this new? I always thought that every statement was wrapped in its own transaction unless you explicitly start your own. So you shouldn't need to commit before closing a connection if you never opened a transaction to begin with. -- Regards, Richard Broersma Jr. The default of autocommit unless explicitly starting a transaction with BEGIN is the normal behavior that I have seen as well. Cheers, Ken Crikey! You're right. I need to be more careful with my assumptions. I maintain that people need to be more careful with pg transactions. I've seen several posts about idle in transaction. But its not as bad as I made out. My confusion comes from the library I use to hit PG, which fires off a begin for me, and if I dont explicitly commit, it gets rolled back. sorry, it was confused between framework and PG. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
What I was really after was a quick comparison between the two. I did not create anything special, just the two tables. One table SQL generated the records for me. I did not tweak anything after installing either system. There was a primary key on the ID field of both tables, no indexes though in either system. The second table had 1 record in it. The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. Btw, the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus, http://www.cdw.com/shop/products/default.aspx?EDC=2167810 . That is why I am pursuing this. :) Here is the ASP.net code that I was running Dim starttime As Date = Date.Now Dim endtime As Date Dim reader As NpgsqlDataReader Dim output2 As String = Dim oConn As New NpgsqlConnection(Server=192.168.1.5;Port=5432;Userid=postgres;Password=12 345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp) oConn.Open() Dim x As Integer = 0 'For x = 0 To 1000 'uncomment to insert records. 'Dim command As New NpgsqlCommand(insert into pgtemp1(name, address, city, state) values ('Tom x ','123 x main st','rockford',1) , oConn) 'meant for loop to put in 1,000 records in pgtemp1 table 'Dim command As New NpgsqlCommand(insert into pgtemp2(statename, stateid, other) values ('Illinois',1,'This is a lot of fun') , oConn) 'only sends 1 record into the table pgtemp2 'command.ExecuteNonQuery() 'Next 'join table and read 1000 rows. Dim command As New NpgsqlCommand(select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid, oConn) reader = command.ExecuteReader() While reader.read() output2 += trtd reader(name) /tdtd reader(address) /tdtd reader(city) /tdtd reader(statename) /tdtd reader(other) /td/tr End While oConn.Close() readeroutput.text = tabletrtdName:/tdtdAddress:/tdtdCity:/tdtdState/tdtd Other/td/tr output2 /table endtime = Date.Now Dim runtime As String runtime = endtime.Subtract(starttime).TotalSeconds output.text = starttime.ToString runtime The SQL is a straight convert from MS SQL code. I did not tweak either system. From EXPLAIN ANALYZE I can see the query ran much faster. Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual time=0.036..4.679 rows=1001 loops=1) Join Filter: (pgtemp1.state = pgtemp2.stateid) - Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) (actual time=0.010..0.764 rows=1001 loops=1) - Materialize (cost=0.00..1.01 rows=1 width=510) (actual time=0.000..0.001 rows=1 loops=1001) - Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) (actual time=0.006..0.008 rows=1 loops=1) Total runtime: 5.128 ms The general question comes down to, can I expect decent perfomance from Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL 2000 since MS 2000 is over 10 years old. Thanks, Tom Polak Rockford Area Association of Realtors 815-395-6776 x203 The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you. Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results. -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy Colson Sent: Tuesday, December 07, 2010 2:23 PM To: Kenneth Marshall Cc: Richard Broersma; Justin Pitts; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows On 12/7/2010 2:10 PM, Kenneth Marshall wrote: On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: On Tue, Dec 7, 2010 at 11:43 AM, Andy Colsona...@squeakycode.net wrote: In PG the first statement you fire off (like an insert into for example) will start a transaction. ?If you dont commit before you disconnect that transaction will be rolled back. ?Even worse, if your program does not commit, but keeps the connection to the db open, the transaction will stay open too. Huh - is this new? I always thought that every statement was wrapped in its own transaction unless you explicitly start
Re: [PERFORM] Performance under contention
2010/12/7 Robert Haas robertmh...@gmail.com On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras ivo...@freebsd.org wrote: I'm not very familiar with PostgreSQL code but if we're brainstorming... if you're only trying to protect against a small number of expensive operations (like DROP, etc.) that don't really happen often, wouldn't an atomic reference counter be good enough for the purpose (e.g. the expensive operations would spin-wait until the counter is 0)? No, because (1) busy-waiting is only suitable for locks that will only be held for a short time, and an AccessShareLock on a table might be held while we read 10GB of data in from disk, and (2) that wouldn't allow for deadlock detection. As far as I understand this thread, the talk is about contention - where large number of processors want to get single partition lock to get high-level shared lock. As far as I can see from the source, there is a lot of code executed under the partition lock protection, like two hash searches (and possibly allocations). What can be done, is that number of locks can be increased - one could use spin locks for hash table manipulations, e.g. a lock preventing rehashing (number of baskets being changed) and a lock for required basket. In this case only small range of code can be protected by partition lock. As for me, this will make locking process more cpu-intensive (more locks will be acquired/freed during the exection), but will decrease contention (since all but one lock can be spin locks working on atomic counters, hash searches can be done in parallel), won't it? The thing I am not sure in is how much spinlocks on atomic counters cost today. -- Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 12/7/10 1:29 PM, Tom Polak wrote: What I was really after was a quick comparison between the two. I did not create anything special, just the two tables. One table SQL generated the records for me. I did not tweak anything after installing either system. That's not a valid test. Postgres is NOT intended to be used out of the box. The default parameters aren't useful. There was a primary key on the ID field of both tables, no indexes though in either system. The second table had 1 record in it. The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. Btw, the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus, http://www.cdw.com/shop/products/default.aspx?EDC=2167810 . That is why I am pursuing this. :) Here is the ASP.net code that I was running Dim starttime As Date = Date.Now Dim endtime As Date Dim reader As NpgsqlDataReader Dim output2 As String = Dim oConn As New NpgsqlConnection(Server=192.168.1.5;Port=5432;Userid=postgres;Password=12 345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp) oConn.Open() Dim x As Integer = 0 'For x = 0 To 1000 'uncomment to insert records. 'Dim command As New NpgsqlCommand(insert into pgtemp1(name, address, city, state) values ('Tom x ','123 x main st','rockford',1) , oConn) 'meant for loop to put in 1,000 records in pgtemp1 table 'Dim command As New NpgsqlCommand(insert into pgtemp2(statename, stateid, other) values ('Illinois',1,'This is a lot of fun') , oConn) 'only sends 1 record into the table pgtemp2 'command.ExecuteNonQuery() 'Next You still haven't done an ANALYZE sql statement after filling your tables with data. You should execute analyze pgtemp1 and analyze pgtemp2 before you do any performance tests. Otherwise your results are meaningless. Craig 'join table and read 1000 rows. Dim command As New NpgsqlCommand(select name,address,city,state,statename,stateid,other from pgtemp1 left join pgtemp2 on state=stateid, oConn) reader = command.ExecuteReader() While reader.read() output2 += trtd reader(name) /tdtd reader(address) /tdtd reader(city) /tdtd reader(statename) /tdtd reader(other) /td/tr End While oConn.Close() readeroutput.text = tabletrtdName:/tdtdAddress:/tdtdCity:/tdtdState/tdtd Other/td/tr output2 /table endtime = Date.Now Dim runtime As String runtime = endtime.Subtract(starttime).TotalSeconds output.text = starttime.ToString runtime The SQL is a straight convert from MS SQL code. I did not tweak either system. From EXPLAIN ANALYZE I can see the query ran much faster. Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual time=0.036..4.679 rows=1001 loops=1) Join Filter: (pgtemp1.state = pgtemp2.stateid) - Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) (actual time=0.010..0.764 rows=1001 loops=1) - Materialize (cost=0.00..1.01 rows=1 width=510) (actual time=0.000..0.001 rows=1 loops=1001) - Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) (actual time=0.006..0.008 rows=1 loops=1) Total runtime: 5.128 ms The general question comes down to, can I expect decent perfomance from Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL 2000 since MS 2000 is over 10 years old. Thanks, Tom Polak Rockford Area Association of Realtors 815-395-6776 x203 The information contained in this email message is intended only for the use of the individual or entity named. If the reader of this email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please immediately notify us by telephone and reply email. Thank you. Although this email and any attachments are believed to be free of any viruses or other defects that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is free of viruses, and the Rockford Area Association of Realtors hereby disclaims any liability for any loss or damage that results. -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy Colson Sent: Tuesday, December 07, 2010 2:23 PM To: Kenneth Marshall Cc: Richard Broersma; Justin Pitts; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows On 12/7/2010 2:10 PM, Kenneth Marshall wrote: On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote: On Tue, Dec 7, 2010 at 11:43 AM, Andy Colsona...@squeakycode.net wrote: In PG
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
On 07/12/2010 9:29 PM, Tom Polak wrote: From EXPLAIN ANALYZE I can see the query ran much faster. Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual time=0.036..4.679 rows=1001 loops=1) Join Filter: (pgtemp1.state = pgtemp2.stateid) - Seq Scan on pgtemp1 (cost=0.00..122.01 rows=1001 width=788) (actual time=0.010..0.764 rows=1001 loops=1) - Materialize (cost=0.00..1.01 rows=1 width=510) (actual time=0.000..0.001 rows=1 loops=1001) - Seq Scan on pgtemp2 (cost=0.00..1.01 rows=1 width=510) (actual time=0.006..0.008 rows=1 loops=1) Total runtime: 5.128 ms The general question comes down to, can I expect decent perfomance from Postgresql compared to MSSQL. I was hoping that Postgresql 9.0 beat MSSQL 2000 since MS 2000 is over 10 years old. So postgres actually executed the select in around 5 miiliseconds. Pretty good I would say. The problem therefore lies not with postgres itself, but what is done with the results afterwards? Assuming that this is pure local and therefore no network issues, perhaps there is a performance issue in this case with the Npgsql driver? Someone who knows more about this driver could perhaps shed some light on this? I have used .NET (C#) with postgres before, but only using the odbc driver. Perhaps you could try that instead (using OdbcCommand, OdbcDataReader etc.). I mainly use ruby (jruby) with postgres both under linux and Windows, but I can certainly process 1000 records of similar structure in well under 1 second. Cheers, Gary. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Update problem on large table
On Mon, Dec 6, 2010 at 4:31 PM, felix crucialfe...@gmail.com wrote: thanks for the replies !, but actually I did figure out how to kill it but pb_cancel_backend didn't work. here's some notes: this has been hung for 5 days: ns | 32681 | nssql | IDLE in transaction | f | 2010-12-01 15 Right, pg_cancel_backend() isn't going to help when the session you're trying to kill is 'IDLE in transaction' -- there's no query to be killed. If this 'IDLE in transaction' session was causing problems by blocking other transactions, you should look at the application running these queries and figure out why it's hanging out in this state. Staying like that for 5 days is not a good sign, and can cause also problems with e.g. autovacuum. [snip] but it still will not die the docs for pg_ctl state: Use pb_ctl --help to see a list of supported signal names. doing so does indeed tell me the names: HUP INT QUIT ABRT TERM USR1 USR2 but nothing about them whatseover :) I agree this could be better documented. There's a brief mention at: http://www.postgresql.org/docs/current/static/app-postgres.html#AEN77350 To cancel a running query, send the SIGINT signal to the process running that command. though that snippet of information is out-of-place on a page about the postmaster, and SIGINT vs. SIGTERM for individual backends isn't discussed there at any rate. At any rate, as you discovered, you have to send SIGTERM to the backend to kill off an 'IDLE in transaction' session. If you're using 8.4 or newer, you have pg_terminate_backend() as a SQL wrapper for SIGTERM. If you're using an older version, be careful, see e.g. http://archives.postgresql.org/pgsql-admin/2010-04/msg00274.php Josh -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows
Tom Polak t...@rockfordarearealtors.org wrote: I did not tweak anything after installing either system. PostgreSQL is set up with defaults such that it will start up and run on the most ancient an underpowered system people are likely to have lying around. It is expected that people will tune it for serious production use, although people often run for years before they hit a case where the tuning makes enough of a difference that they do something about it. For guidelines see this page: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server You can get a quick comparison without doing any tuning, but it won't tell you much about how something else compares to PostgreSQL when it is configured for production use. The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5. For comparison, I would set shared_buffers to at least 200 MB, effective_cache_size to 2 to 3 GB, and I would probably drop both seq_page_cost and random_page_cost to 0.1, unless you actually expect to be using a database large enough that the active portion won't be cached. (In that case, a test with tiny tables *really* means nothing, though.) There are other settings that will also help. Nested Loop Left Join (cost=0.00..138.04 rows=1001 width=1298) (actual time=0.036..4.679 rows=1001 loops=1) Total runtime: 5.128 ms The 0.036 ms is how long it took to produce the first row of the result once it started running, 4.679 ms is the total run time, and 5.128 includes miscellaneous other time, such as planning time. Of course, the EXPLAIN ANALYZE adds some overhead, so the actual run time would normally be faster, and with tuning it might be still faster. The general question comes down to, can I expect decent perfomance from Postgresql compared to MSSQL. That has been my experience. There's something about your runtime environment which isn't playing well with PostgreSQL. If it were me, I would make sure that as little of my stack as possible depended on products provided by anyone with an interest in seeing PostgreSQL look bad compared to the alternative. I can think of at least one company with fourteen thousand reasons to do so. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
2010/12/7 Віталій Тимчишин tiv...@gmail.com: 2010/12/7 Robert Haas robertmh...@gmail.com On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras ivo...@freebsd.org wrote: I'm not very familiar with PostgreSQL code but if we're brainstorming... if you're only trying to protect against a small number of expensive operations (like DROP, etc.) that don't really happen often, wouldn't an atomic reference counter be good enough for the purpose (e.g. the expensive operations would spin-wait until the counter is 0)? No, because (1) busy-waiting is only suitable for locks that will only be held for a short time, and an AccessShareLock on a table might be held while we read 10GB of data in from disk, and (2) that wouldn't allow for deadlock detection. What can be done, is that number of locks can be increased - one could use spin locks for hash table manipulations, e.g. a lock preventing rehashing (number of baskets being changed) and a lock for required basket. In this case only small range of code can be protected by partition lock. As for me, this will make locking process more cpu-intensive (more locks will be acquired/freed during the exection), but will decrease contention (since all but one lock can be spin locks working on atomic counters, hash searches can be done in parallel), won't it? For what it's worth, this is pretty much the opposite of what I had in mind. I proposed atomic reference counters (as others pointed, this probably won't work) as poor-man's shared-exclusive locks, so that most operations would not have to contend on them. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
2010/12/7 Віталій Тимчишин tiv...@gmail.com: As far as I can see from the source, there is a lot of code executed under the partition lock protection, like two hash searches (and possibly allocations). Yeah, that was my concern, too, though Tom seems skeptical (perhaps rightly). And I'm not really sure why the PROCLOCKs need to be in a hash table anyway - if we know the PROC and LOCK we can surely look up the PROCLOCK pretty expensively by following the PROC SHM_QUEUE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance under contention
2010/12/7 Robert Haas robertmh...@gmail.com: 2010/12/7 Віталій Тимчишин tiv...@gmail.com: As far as I can see from the source, there is a lot of code executed under the partition lock protection, like two hash searches (and possibly allocations). Yeah, that was my concern, too, though Tom seems skeptical (perhaps rightly). And I'm not really sure why the PROCLOCKs need to be in a hash table anyway - if we know the PROC and LOCK we can surely look up the PROCLOCK pretty expensively by following the PROC SHM_QUEUE. Err, pretty INexpensively. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance