[PERFORM] Slow BLOBs restoring

2010-12-07 Thread Vlad Arkhipov
I have encountered a problem while restoring the database. There is a 
table that contains XML data (BLOB), ~ 3 000 000 records, ~ 5.5Gb of 
data. pg_restore has been running for a week without any considerable 
progress. There are plenty of lines like these in the log:


pg_restore: processing item 3125397 BLOB 10001967
pg_restore: executing BLOB 10001967

CPU usage is 100% always. The total database size is about 100 Gb and it 
restores in an hour or so without BLOBs.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I wonder if it would be possible to have a very short critical section
 where we grab the partition lock, acquire the heavyweight lock, and
 release the partition lock; and then only as a second step record (in
 the form of a PROCLOCK) the fact that we got it.

[ confused... ]  Exactly what do you suppose acquire the lock would
be represented as, if not create a PROCLOCK entry attached to it?

In any case, I think this is another example of not understanding where
the costs really are.  As far as I can tell, on modern MP systems much
of the elapsed time in these operations comes from acquiring exclusive
access to shared-memory cache lines.  Reducing the number of changes you
have to make within a small area of shared memory won't save much, once
you've paid for the first one.  Changing structures that aren't heavily
contended (such as a proc's list of its own locks) doesn't cost much at
all.

One thing that might be interesting, but that I don't know how to attack
in a reasonably machine-independent way, is to try to ensure that shared
and local data structures don't accidentally overlap within cache lines.
When they do, you pay for fighting the cache line away from another
processor even when there's no real need.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Dave Crooke
Hi Tom

I suspect I may be missing something here, but I think it's a pretty
universal truism that cache lines are aligned to power-of-2 memory
addresses, so it would suffice to ensure during setup that the lower order n
bits of the object address are all zeros for each critical object; if the
malloc() routine being used doesn't support that, it could be done by
allocating a slightly larger than necessary block of memory and choosing a
location within that.

The value of n could be architecture dependent, but n=8 would cover
everyone, hopefully without wasting too much RAM.

Cheers
Dave

On Tue, Dec 7, 2010 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:


 One thing that might be interesting, but that I don't know how to attack
 in a reasonably machine-independent way, is to try to ensure that shared
 and local data structures don't accidentally overlap within cache lines.
 When they do, you pay for fighting the cache line away from another
 processor even when there's no real need.

regards, tom lane

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



[PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Tom Polak
We are in the process of deciding on how to proceed on a database upgrade.
We currently have MS SQL 2000 running on Windows 2003 (on my test server).
I was shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU
license).  I started comparing DB’s and came across postgresql.  It seemed
to be exactly what I was after.  All of our programming is in ASP.net.
Since I am running MSSQL 2000 I have no benefit for .Net integration, so it
is not a concern.



I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are
running on Windows 2003.  What I found was quite surprising and I am
wondering if anyone can point out what is going on here.
Here is the test I ran.
I created 2 tables, the main table had 5 fields with a serial ID field.  The
second table linked to table 1 for a state field.

I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85
seconds to complete.
Next I had ASP.net via Postgresql create 1,000 records.  Took .65625
seconds.
Postgresql smoked MS SQL server on that test.



Next test is to use ASP.net and join all 1,000 rows with table 2 and then
display the text out.

MS SQL took 0.76 seconds to display
select name,address,city,state,statename,stateid,other from pgtemp1 left
join pgtemp2 on state=stateid



Then I did the same test via Postgresql and it took 8.85 seconds!  I tried
it again as I thought I did something wrong.  I did a few tweaks such as
increasing the shared buffers.  Still the best I could get it to was 7.5
seconds.  This is insanely slow compared to MSSQL 2000.  What am I missing.
Here is my SQL statement for postgresql:
select name,address,city,state,statename,stateid,other from pgtemp1 left
join pgtemp2 on state=stateid



Any ideas on why the Postgres server is s much slower on the joins?   I
am trying to understand what is going on here so please don’t flame me.  Any
advice is appreciated.





*Thanks,
Tom Polak
Rockford Area Association of Realtors
**
The information contained in this email message is intended only for the use
of the individual or entity named.  If the reader of this email is not the
intended recipient or the employee or agent responsible for delivering it to
the intended recipient, you are hereby notified that any dissemination,
distribution or copying of this email is strictly prohibited.  If you have
received this email in error, please immediately notify us by telephone and
reply email.  Thank you.*

*Although this email and any attachments are believed to be free of any
viruses or other defects that might affect any computer system into which it
is received and opened, it is the responsibility of the recipient to ensure
that it is free of viruses, and the Rockford Area Association of Realtors
hereby disclaims any liability for any loss or damage that results.*
image001.jpg

Re: [PERFORM] Performance under contention

2010-12-07 Thread Ivan Voras
On 7 December 2010 18:37, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 6, 2010 at 9:59 PM, Jignesh Shah jks...@gmail.com wrote:
 That's exactly what I concluded when I was doing the sysbench simple
 read-only test. I had also tried with different lock partitions and it
 did not help since they all go after the same table. I think one way
 to kind of avoid the problem on the same table is to do more granular
 locking (Maybe at page level instead of table level). But then I dont
 really understand on how to even create a prototype related to this
 one. If you can help create a prototype then I can test it out with my
 setup and see if it helps us to catch up with other guys out there.

 We're trying to lock the table against a concurrent DROP or schema
 change, so locking only part of it doesn't really work.  I don't
 really see any way to avoid needing some kind of a lock here; the
 trick is how to take it quickly.  The main obstacle to making this
 faster is that the deadlock detector needs to be able to obtain enough
 information to break cycles, which means we've got to record in shared
 memory not only the locks that are granted but who has them.

I'm not very familiar with PostgreSQL code but if we're
brainstorming... if you're only trying to protect against a small
number of expensive operations (like DROP, etc.) that don't really
happen often, wouldn't an atomic reference counter be good enough for
the purpose (e.g. the expensive operations would spin-wait until the
counter is 0)?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Robert Haas
On Tue, Dec 7, 2010 at 12:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I wonder if it would be possible to have a very short critical section
 where we grab the partition lock, acquire the heavyweight lock, and
 release the partition lock; and then only as a second step record (in
 the form of a PROCLOCK) the fact that we got it.

 [ confused... ]  Exactly what do you suppose acquire the lock would
 be represented as, if not create a PROCLOCK entry attached to it?

Update the granted array and, if necessary, the grantMask.

 In any case, I think this is another example of not understanding where
 the costs really are.

Possible.

 As far as I can tell, on modern MP systems much
 of the elapsed time in these operations comes from acquiring exclusive
 access to shared-memory cache lines.  Reducing the number of changes you
 have to make within a small area of shared memory won't save much, once
 you've paid for the first one.

Seems reasonable.

 Changing structures that aren't heavily
 contended (such as a proc's list of its own locks) doesn't cost much at
 all.

I'm not sure where you're getting the idea that a proc's list of its
own locks isn't heavily contended.   That could be true, but it isn't
obvious to me.  We allocate PROCLOCK structures out of a shared hash
table while holding the lock manager partition lock, and we add every
lock to a queue associated with the PROC and a second queue associated
with the LOCK.  So if two processes acquire an AccessShareLock on the
same table, both the LOCK object and at least the SHM_QUEUE portions
of each PROCLOCK are shared, and those aren't necessarily nearby in
memory.

 One thing that might be interesting, but that I don't know how to attack
 in a reasonably machine-independent way, is to try to ensure that shared
 and local data structures don't accidentally overlap within cache lines.
 When they do, you pay for fighting the cache line away from another
 processor even when there's no real need.

I'd be sort of surprised if this is a problem - as I understand it,
cache lines are small, contiguous chunks, and surely the heap and the
shared memory segment are mapped into different portions of the
address space...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Robert Haas
On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras ivo...@freebsd.org wrote:
 On 7 December 2010 18:37, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 6, 2010 at 9:59 PM, Jignesh Shah jks...@gmail.com wrote:
 That's exactly what I concluded when I was doing the sysbench simple
 read-only test. I had also tried with different lock partitions and it
 did not help since they all go after the same table. I think one way
 to kind of avoid the problem on the same table is to do more granular
 locking (Maybe at page level instead of table level). But then I dont
 really understand on how to even create a prototype related to this
 one. If you can help create a prototype then I can test it out with my
 setup and see if it helps us to catch up with other guys out there.

 We're trying to lock the table against a concurrent DROP or schema
 change, so locking only part of it doesn't really work.  I don't
 really see any way to avoid needing some kind of a lock here; the
 trick is how to take it quickly.  The main obstacle to making this
 faster is that the deadlock detector needs to be able to obtain enough
 information to break cycles, which means we've got to record in shared
 memory not only the locks that are granted but who has them.

 I'm not very familiar with PostgreSQL code but if we're
 brainstorming... if you're only trying to protect against a small
 number of expensive operations (like DROP, etc.) that don't really
 happen often, wouldn't an atomic reference counter be good enough for
 the purpose (e.g. the expensive operations would spin-wait until the
 counter is 0)?

No, because (1) busy-waiting is only suitable for locks that will only
be held for a short time, and an AccessShareLock on a table might be
held while we read 10GB of data in from disk, and (2) that wouldn't
allow for deadlock detection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Kevin Grittner
Tom Polak t...@rockfordarearealtors.org wrote:
 
 the best I could get it to was 7.5 seconds.
 
 select name,address,city,state,statename,stateid,other from
 pgtemp1 left join pgtemp2 on state=stateid
 
We'd need a lot more information.  Please read this and post again:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
Be sure to include hardware info, postgresql.conf settings
(excluding comments), table layouts including indexes and
constraints, and the results of:
 
EXPLAIN ANALYZE select ...
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Andy Colson

On 12/7/2010 11:34 AM, Tom Polak wrote:

We are in the process of deciding on how to proceed on a database
upgrade.  We currently have MS SQL 2000 running on Windows 2003 (on my
test server).  I was shocked at the cost for MS SQL 2008 R2 for a new
server (2 CPU license).  I started comparing DB’s and came across
postgresql.  It seemed to be exactly what I was after.  All of our
programming is in ASP.net.  Since I am running MSSQL 2000 I have no
benefit for .Net integration, so it is not a concern.

I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are
running on Windows 2003.  What I found was quite surprising and I am
wondering if anyone can point out what is going on here.
Here is the test I ran.
I created 2 tables, the main table had 5 fields with a serial ID field.
The second table linked to table 1 for a state field.

I had ASP.net via MSSQL create 1,000 records in the main table. Took
9.85 seconds to complete.
Next I had ASP.net via Postgresql create 1,000 records.  Took .65625
seconds.
Postgresql smoked MS SQL server on that test.


did you play with the postgresql.conf file?  Maybe turn off fsync?  I'd 
guess the above is mssql is flushing to disk while PG isnt.




Next test is to use ASP.net and join all 1,000 rows with table 2 and
then display the text out.

MS SQL took 0.76 seconds to display
select name,address,city,state,statename,stateid,other from pgtemp1 left
join pgtemp2 on state=stateid

Then I did the same test via Postgresql and it took 8.85 seconds!  I
tried it again as I thought I did something wrong.  I did a few tweaks
such as increasing the shared buffers.  Still the best I could get it to
was 7.5 seconds.  This is insanely slow compared to MSSQL 2000.  What am
I missing.  Here is my SQL statement for postgresql:
select name,address,city,state,statename,stateid,other from pgtemp1 left
join pgtemp2 on state=stateid

Any ideas on why the Postgres server is s much slower on the
joins?   I am trying to understand what is going on here so please don’t
flame me.  Any advice is appreciated.



Did you create an index?  That'd be my first guess.  Also, can you run 
the sql from the command line client (psql) and see if it takes that 
long?  While your in psql, stick a 'explain analyze' infront of your 
query, and let's see its output.


Also, as a fair warning: mssql doesn't really care about transactions, 
but PG really does.  Make sure all your code is properly starting and 
commiting transactions.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Andres Freund
On Tuesday 07 December 2010 18:34:25 Tom Polak wrote:
 Then I did the same test via Postgresql and it took 8.85 seconds!  I tried
 it again as I thought I did something wrong.  I did a few tweaks such as
 increasing the shared buffers.  Still the best I could get it to was 7.5
 seconds.  This is insanely slow compared to MSSQL 2000.  What am I missing.
 Here is my SQL statement for postgresql:
 select name,address,city,state,statename,stateid,other from pgtemp1 left
 join pgtemp2 on state=stateid
I think you would at least provide the exact schema and possibly some example 
data (pg_dump) to get us somewhere.

I would suggest you post the output of EXPLAIN ANALYZE $yourquery - that gives 
us information about how that query was executed.

Greetings,

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Ivan Voras
On 7 December 2010 19:10, Robert Haas robertmh...@gmail.com wrote:

 I'm not very familiar with PostgreSQL code but if we're
 brainstorming... if you're only trying to protect against a small
 number of expensive operations (like DROP, etc.) that don't really
 happen often, wouldn't an atomic reference counter be good enough for
 the purpose (e.g. the expensive operations would spin-wait until the
 counter is 0)?

 No, because (1) busy-waiting is only suitable for locks that will only
 be held for a short time, and an AccessShareLock on a table might be
 held while we read 10GB of data in from disk,

Generally yes, but a variant with adaptive sleeping could possibly be
used if it would be acceptable to delay (uncertainly) the already
expensive and rare operations.

 and (2) that wouldn't
 allow for deadlock detection.

Probably :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Craig James

On 12/7/10 9:34 AM, Tom Polak wrote:

We are in the process of deciding on how to proceed on a database upgrade.  We 
currently have MS SQL 2000 running on Windows 2003 (on my test server).  I was 
shocked at the cost for MS SQL 2008 R2 for a new server (2 CPU license).  I 
started comparing DB’s and came across postgresql.  It seemed to be exactly 
what I was after.  All of our programming is in ASP.net.  Since I am running 
MSSQL 2000 I have no benefit for .Net integration, so it is not a concern.

I ran a head to head test of MS SQL 2000 and Postgresql 9.0.  Both are running 
on Windows 2003.  What I found was quite surprising and I am wondering if 
anyone can point out what is going on here.
Here is the test I ran.
I created 2 tables, the main table had 5 fields with a serial ID field.  The 
second table linked to table 1 for a state field.


Did you run ANALYZE on the database after creating it and loading the data?  If 
not, do it and try again (along with the other suggestions you'll get here).  
ANALYZE gathers the statistics that allow the planner to do its job.  Without 
statistics, all bets are off.

Craig


I had ASP.net via MSSQL create 1,000 records in the main table. Took 9.85 
seconds to complete.
Next I had ASP.net via Postgresql create 1,000 records.  Took .65625 seconds.
Postgresql smoked MS SQL server on that test.

Next test is to use ASP.net and join all 1,000 rows with table 2 and then 
display the text out.

MS SQL took 0.76 seconds to display
select name,address,city,state,statename,stateid,other from pgtemp1 left join 
pgtemp2 on state=stateid

Then I did the same test via Postgresql and it took 8.85 seconds!  I tried it 
again as I thought I did something wrong.  I did a few tweaks such as 
increasing the shared buffers.  Still the best I could get it to was 7.5 
seconds.  This is insanely slow compared to MSSQL 2000.  What am I missing.  
Here is my SQL statement for postgresql:
select name,address,city,state,statename,stateid,other from pgtemp1 left join 
pgtemp2 on state=stateid

Any ideas on why the Postgres server is s much slower on the joins?   I am 
trying to understand what is going on here so please don’t flame me.  Any 
advice is appreciated.

*Thanks,
Tom Polak
Rockford Area Association of Realtors
*/
The information contained in this email message is intended only for the use of 
the individual or entity named.  If the reader of this email is not the 
intended recipient or the employee or agent responsible for delivering it to 
the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this email is strictly prohibited.  If you have 
received this email in error, please immediately notify us by telephone and 
reply email.  Thank you./

/Although this email and any attachments are believed to be free of any viruses 
or other defects that might affect any computer system into which it is 
received and opened, it is the responsibility of the recipient to ensure that 
it is free of viruses, and the Rockford Area Association of Realtors hereby 
disclaims any liability for any loss or damage that results./




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Andy Colson

On 12/7/2010 1:22 PM, Justin Pitts wrote:


Also, as a fair warning: mssql doesn't really care about transactions, but
PG really does.  Make sure all your code is properly starting and commiting
transactions.

-Andy


I do not understand that statement. Can you explain it a bit better?


In mssql you can write code that connects to the db, fire off updates 
and inserts, and then disconnects.  I believe mssql will keep all your 
changes, and the transaction stuff is done for you.


In PG the first statement you fire off (like an insert into for 
example) will start a transaction.  If you dont commit before you 
disconnect that transaction will be rolled back.  Even worse, if your 
program does not commit, but keeps the connection to the db open, the 
transaction will stay open too.


There are differences in the way mssql and pg do transactions.  mssql 
uses a transaction log and keeps current data in the table.  In mssql if 
you open a transaction and write a bunch of stuff, the table contains 
that new stuff.  Everyone can see it.  (I think default transaction 
isolation level is read commited).  But if you set your isolation level 
to something with repeatable read, then your program will block and have 
to wait on every little change to the table.  (or, probably page.. I 
think mssql has page level locking?)


anyway, in PG, multiple versions of the same row are kept, and when you 
open, and keep open a transaction, PG has to keep a version of the row 
for every change that other people make.  So a long lasting transaction 
could create hundreds of versions of one row.  Then when somebody goes 
to select against that table, it has to scan not only the rows, but 
every version of every row!


So my point is, in PG, use transactions as they were meant to be used, 
as single atomic operations.  Start, do some work, commit.


mssql made it easy to ignore transactions by doing it for you.  Ignoring 
transaction in PG will hurt you.


you can google MVCC and postgres idle in transaction for more.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Richard Broersma
On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson a...@squeakycode.net wrote:

 In PG the first statement you fire off (like an insert into for example)
 will start a transaction.  If you dont commit before you disconnect that
 transaction will be rolled back.  Even worse, if your program does not
 commit, but keeps the connection to the db open, the transaction will stay
 open too.

Huh - is this new?  I always thought that every statement was wrapped
in its own transaction unless you explicitly start your own.  So you
shouldn't need to commit before closing a connection if you never
opened a transaction to begin with.


-- 
Regards,
Richard Broersma Jr.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Kenneth Marshall
On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:
 On Tue, Dec 7, 2010 at 11:43 AM, Andy Colson a...@squeakycode.net wrote:
 
  In PG the first statement you fire off (like an insert into for example)
  will start a transaction. ?If you dont commit before you disconnect that
  transaction will be rolled back. ?Even worse, if your program does not
  commit, but keeps the connection to the db open, the transaction will stay
  open too.
 
 Huh - is this new?  I always thought that every statement was wrapped
 in its own transaction unless you explicitly start your own.  So you
 shouldn't need to commit before closing a connection if you never
 opened a transaction to begin with.
 
 
 -- 
 Regards,
 Richard Broersma Jr.
 

The default of autocommit unless explicitly starting a transaction with
BEGIN is the normal behavior that I have seen as well.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Gary Doades

On 07/12/2010 7:43 PM, Andy Colson wrote:

On 12/7/2010 1:22 PM, Justin Pitts wrote:


Also, as a fair warning: mssql doesn't really care about 
transactions, but
PG really does.  Make sure all your code is properly starting and 
commiting

transactions.

-Andy


I do not understand that statement. Can you explain it a bit better?


In mssql you can write code that connects to the db, fire off updates 
and inserts, and then disconnects.  I believe mssql will keep all your 
changes, and the transaction stuff is done for you.


In PG the first statement you fire off (like an insert into for 
example) will start a transaction.  If you dont commit before you 
disconnect that transaction will be rolled back.  Even worse, if your 
program does not commit, but keeps the connection to the db open, the 
transaction will stay open too.
As far as I know both MS SQL and  and Postgres work just the same as 
regards explicit and implicit (autocommit) transactions, only the 
underlying storage/logging mechanisms are different.


Transactions shouldn't make ay real difference to the select/join 
performance being complained about though. It's already stated that the 
insert performance of postgres far exceeds SQL Server, which is my 
experience also.


As already suggested, until we see the exact table definitions including 
indexes etc. there's no real way to tell what the problem is. How many 
rows are in the second table? It really shouldn't take that much time to 
read 1000 rows unless you have a bizarrely slow hard disk.


It would be nice to eliminate any programmatic or driver influence too. 
How does the SQL select execute in enterprise manager for mssql and psql 
or pgadmin for postgres?


Cheers,
Gary.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Andy Colson

On 12/7/2010 2:10 PM, Kenneth Marshall wrote:

On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:

On Tue, Dec 7, 2010 at 11:43 AM, Andy Colsona...@squeakycode.net  wrote:


In PG the first statement you fire off (like an insert into for example)
will start a transaction. ?If you dont commit before you disconnect that
transaction will be rolled back. ?Even worse, if your program does not
commit, but keeps the connection to the db open, the transaction will stay
open too.


Huh - is this new?  I always thought that every statement was wrapped
in its own transaction unless you explicitly start your own.  So you
shouldn't need to commit before closing a connection if you never
opened a transaction to begin with.


--
Regards,
Richard Broersma Jr.



The default of autocommit unless explicitly starting a transaction with
BEGIN is the normal behavior that I have seen as well.

Cheers,
Ken


Crikey!  You're right.  I need to be more careful with my assumptions.

I maintain that people need to be more careful with pg transactions. 
I've seen several posts about idle in transaction.  But its not as bad 
as I made out.  My confusion comes from the library I use to hit PG, 
which fires off a begin for me, and if I dont explicitly commit, it 
gets rolled back.


sorry, it was confused between framework and PG.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Tom Polak
What I was really after was a quick comparison between the two.  I did not
create anything special, just the two tables.  One table SQL generated the
records for me.  I did not tweak anything after installing either system.
There was a primary key on the ID field of both tables, no indexes though
in either system.  The second table had 1 record in it.  The hardware it
is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.  Btw,
the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus,
http://www.cdw.com/shop/products/default.aspx?EDC=2167810 .  That is why I
am pursuing this.  :)

Here is the ASP.net code that I was running
Dim starttime As Date = Date.Now
Dim endtime As Date
Dim reader As NpgsqlDataReader
Dim output2 As String = 


Dim oConn As New
NpgsqlConnection(Server=192.168.1.5;Port=5432;Userid=postgres;Password=12
345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin
g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp)
oConn.Open()
Dim x As Integer = 0
'For x = 0 To 1000 'uncomment to insert records.
'Dim command As New NpgsqlCommand(insert into pgtemp1(name,
address, city, state) values ('Tom  x  ','123  x   main
st','rockford',1) , oConn) 'meant for loop to put in 1,000 records in
pgtemp1 table
'Dim command As New NpgsqlCommand(insert into pgtemp2(statename,
stateid, other) values ('Illinois',1,'This is a lot of fun') , oConn)
'only sends 1 record into the table pgtemp2
'command.ExecuteNonQuery()
'Next

'join table and read 1000 rows.
Dim command As New NpgsqlCommand(select
name,address,city,state,statename,stateid,other from pgtemp1 left join
pgtemp2 on state=stateid, oConn)
reader = command.ExecuteReader()
While reader.read()
output2 += trtd  reader(name)  /tdtd 
reader(address)  /tdtd  reader(city)  /tdtd 
reader(statename)  /tdtd  reader(other)  /td/tr
End While
oConn.Close()
readeroutput.text =
tabletrtdName:/tdtdAddress:/tdtdCity:/tdtdState/tdtd
Other/td/tr  output2  /table

endtime = Date.Now
Dim runtime As String
runtime = endtime.Subtract(starttime).TotalSeconds
output.text = starttime.ToString runtime

The SQL is a straight convert from MS SQL code.  I did not tweak either
system.

From EXPLAIN ANALYZE I can see the query ran much faster.
Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
time=0.036..4.679 rows=1001 loops=1)
  Join Filter: (pgtemp1.state = pgtemp2.stateid)
  -  Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
(actual time=0.010..0.764 rows=1001 loops=1)
  -  Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
time=0.000..0.001 rows=1 loops=1001)
-  Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
(actual time=0.006..0.008 rows=1 loops=1)
Total runtime: 5.128 ms

The general question comes down to, can I expect decent perfomance from
Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
2000 since MS 2000 is over 10 years old.

Thanks,
Tom Polak
Rockford Area Association of Realtors
815-395-6776 x203

The information contained in this email message is intended only for the
use of the individual or entity named.  If the reader of this email is not
the intended recipient or the employee or agent responsible for delivering
it to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this email is strictly
prohibited.  If you have received this email in error, please immediately
notify us by telephone and reply email.  Thank you.

Although this email and any attachments are believed to be free of any
viruses or other defects that might affect any computer system into which
it is received and opened, it is the responsibility of the recipient to
ensure that it is free of viruses, and the Rockford Area Association of
Realtors hereby disclaims any liability for any loss or damage that
results.


-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy Colson
Sent: Tuesday, December 07, 2010 2:23 PM
To: Kenneth Marshall
Cc: Richard Broersma; Justin Pitts; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

On 12/7/2010 2:10 PM, Kenneth Marshall wrote:
 On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:
 On Tue, Dec 7, 2010 at 11:43 AM, Andy Colsona...@squeakycode.net
wrote:

 In PG the first statement you fire off (like an insert into for
example)
 will start a transaction. ?If you dont commit before you disconnect
that
 transaction will be rolled back. ?Even worse, if your program does not
 commit, but keeps the connection to the db open, the transaction will
stay
 open too.

 Huh - is this new?  I always thought that every statement was wrapped
 in its own transaction unless you explicitly start 

Re: [PERFORM] Performance under contention

2010-12-07 Thread Віталій Тимчишин
2010/12/7 Robert Haas robertmh...@gmail.com

 On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras ivo...@freebsd.org wrote:

  I'm not very familiar with PostgreSQL code but if we're
  brainstorming... if you're only trying to protect against a small
  number of expensive operations (like DROP, etc.) that don't really
  happen often, wouldn't an atomic reference counter be good enough for
  the purpose (e.g. the expensive operations would spin-wait until the
  counter is 0)?

 No, because (1) busy-waiting is only suitable for locks that will only
 be held for a short time, and an AccessShareLock on a table might be
 held while we read 10GB of data in from disk, and (2) that wouldn't
 allow for deadlock detection.


As far as I understand this thread, the talk is about contention - where
large number of processors want to get single partition lock to get
high-level shared lock.
As far as I can see from the source, there is a lot of code executed under
the partition lock protection, like two hash searches (and possibly
allocations).
What can be done, is that number of locks can be increased - one could use
spin locks for hash table manipulations, e.g. a lock preventing rehashing
(number of baskets being changed) and a lock for required basket.
In this case only small range of code can be protected by partition lock.
As for me, this will make locking process more cpu-intensive (more locks
will be acquired/freed during the exection), but will decrease contention
(since all but one lock can be spin locks working on atomic counters, hash
searches can be done in parallel), won't it?
The thing I am not sure in is how much spinlocks on atomic counters cost
today.

-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Craig James

On 12/7/10 1:29 PM, Tom Polak wrote:

What I was really after was a quick comparison between the two.  I did not
create anything special, just the two tables.  One table SQL generated the
records for me.  I did not tweak anything after installing either system.


That's not a valid test.  Postgres is NOT intended to be used out of the box.  
The default parameters aren't useful.


There was a primary key on the ID field of both tables, no indexes though
in either system.  The second table had 1 record in it.  The hardware it
is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.  Btw,
the cost for MS SQL 2008 R2 is ~$14,000 for 2 cpus,
http://www.cdw.com/shop/products/default.aspx?EDC=2167810 .  That is why I
am pursuing this.  :)

Here is the ASP.net code that I was running
Dim starttime As Date = Date.Now
 Dim endtime As Date
 Dim reader As NpgsqlDataReader
 Dim output2 As String = 


 Dim oConn As New
NpgsqlConnection(Server=192.168.1.5;Port=5432;Userid=postgres;Password=12
345;Protocol=3;SSL=false;Pooling=true;MinPoolSize=1;MaxPoolSize=20;Encodin
g=UNICODE;Timeout=15;SslMode=Disable;Database=tomtemp)
 oConn.Open()
 Dim x As Integer = 0
 'For x = 0 To 1000 'uncomment to insert records.
 'Dim command As New NpgsqlCommand(insert into pgtemp1(name,
address, city, state) values ('Tom  x  ','123  x   main
st','rockford',1) , oConn) 'meant for loop to put in 1,000 records in
pgtemp1 table
 'Dim command As New NpgsqlCommand(insert into pgtemp2(statename,
stateid, other) values ('Illinois',1,'This is a lot of fun') , oConn)
'only sends 1 record into the table pgtemp2
 'command.ExecuteNonQuery()
 'Next


You still haven't done an ANALYZE sql statement after filling your tables with data.  You should 
execute analyze pgtemp1 and analyze pgtemp2 before you do any performance 
tests.  Otherwise your results are meaningless.

Craig



 'join table and read 1000 rows.
 Dim command As New NpgsqlCommand(select
name,address,city,state,statename,stateid,other from pgtemp1 left join
pgtemp2 on state=stateid, oConn)
 reader = command.ExecuteReader()
 While reader.read()
 output2 += trtd  reader(name)  /tdtd
reader(address)  /tdtd  reader(city)  /tdtd
reader(statename)  /tdtd  reader(other)  /td/tr
 End While
 oConn.Close()
 readeroutput.text =
tabletrtdName:/tdtdAddress:/tdtdCity:/tdtdState/tdtd

Other/td/tr  output2  /table


 endtime = Date.Now
 Dim runtime As String
 runtime = endtime.Subtract(starttime).TotalSeconds
 output.text = starttime.ToString runtime

The SQL is a straight convert from MS SQL code.  I did not tweak either
system.


From EXPLAIN ANALYZE I can see the query ran much faster.

Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
time=0.036..4.679 rows=1001 loops=1)
  Join Filter: (pgtemp1.state = pgtemp2.stateid)
  -   Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
(actual time=0.010..0.764 rows=1001 loops=1)
  -   Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
time=0.000..0.001 rows=1 loops=1001)
-   Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
(actual time=0.006..0.008 rows=1 loops=1)
Total runtime: 5.128 ms

The general question comes down to, can I expect decent perfomance from
Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
2000 since MS 2000 is over 10 years old.

Thanks,
Tom Polak
Rockford Area Association of Realtors
815-395-6776 x203

The information contained in this email message is intended only for the
use of the individual or entity named.  If the reader of this email is not
the intended recipient or the employee or agent responsible for delivering
it to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this email is strictly
prohibited.  If you have received this email in error, please immediately
notify us by telephone and reply email.  Thank you.

Although this email and any attachments are believed to be free of any
viruses or other defects that might affect any computer system into which
it is received and opened, it is the responsibility of the recipient to
ensure that it is free of viruses, and the Rockford Area Association of
Realtors hereby disclaims any liability for any loss or damage that
results.


-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy Colson
Sent: Tuesday, December 07, 2010 2:23 PM
To: Kenneth Marshall
Cc: Richard Broersma; Justin Pitts; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

On 12/7/2010 2:10 PM, Kenneth Marshall wrote:

On Tue, Dec 07, 2010 at 11:56:51AM -0800, Richard Broersma wrote:

On Tue, Dec 7, 2010 at 11:43 AM, Andy Colsona...@squeakycode.net

wrote:



In PG 

Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Gary Doades

On 07/12/2010 9:29 PM, Tom Polak wrote:


 From EXPLAIN ANALYZE I can see the query ran much faster.
Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
time=0.036..4.679 rows=1001 loops=1)
  Join Filter: (pgtemp1.state = pgtemp2.stateid)
  -   Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
(actual time=0.010..0.764 rows=1001 loops=1)
  -   Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
time=0.000..0.001 rows=1 loops=1001)
-   Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
(actual time=0.006..0.008 rows=1 loops=1)
Total runtime: 5.128 ms

The general question comes down to, can I expect decent perfomance from
Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
2000 since MS 2000 is over 10 years old.

So postgres actually executed the select in around 5 miiliseconds. 
Pretty good I would say. The problem therefore lies not with postgres 
itself, but what is done with the results afterwards? Assuming that this 
is pure local and therefore no network issues, perhaps there is a 
performance issue in this case with the Npgsql driver? Someone who knows 
more about this driver could perhaps shed some light on this?


I have used .NET (C#) with postgres before, but only using the odbc 
driver. Perhaps you could try that instead (using OdbcCommand, 
OdbcDataReader etc.).


I mainly use ruby (jruby) with postgres both under linux and Windows, 
but I can certainly process 1000 records of similar structure in well 
under 1 second.


Cheers,
Gary.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Update problem on large table

2010-12-07 Thread Josh Kupershmidt
On Mon, Dec 6, 2010 at 4:31 PM, felix crucialfe...@gmail.com wrote:

 thanks for the replies !,
 but actually I did figure out how to kill it
 but pb_cancel_backend didn't work.  here's some notes:
 this has been hung for 5 days:
 ns      |   32681 | nssql   | IDLE in transaction | f       | 2010-12-01
 15

Right, pg_cancel_backend() isn't going to help when the session you're
trying to kill is 'IDLE in transaction' -- there's no query to be
killed. If this 'IDLE in transaction' session was causing problems
by blocking other transactions, you should look at the application
running these queries and figure out why it's hanging out in this
state. Staying like that for 5 days is not a good sign, and can cause
also problems with e.g. autovacuum.

[snip]

 but it still will not die
 the docs for pg_ctl state:
 Use pb_ctl --help to see a list of supported signal names.
 doing so does indeed tell me the names:
 HUP INT QUIT ABRT TERM USR1 USR2
 but nothing about them whatseover :)

I agree this could be better documented. There's a brief mention at:
  http://www.postgresql.org/docs/current/static/app-postgres.html#AEN77350
  To cancel a running query, send the SIGINT signal to the process
running that command.

though that snippet of information is out-of-place on a page about the
postmaster, and SIGINT vs. SIGTERM for individual backends isn't
discussed there at any rate.

At any rate, as you discovered, you have to send SIGTERM to the
backend to kill off an 'IDLE in transaction' session. If you're
using 8.4 or newer, you have pg_terminate_backend() as a SQL wrapper
for SIGTERM. If you're using an older version, be careful, see e.g.
  http://archives.postgresql.org/pgsql-admin/2010-04/msg00274.php

Josh

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Kevin Grittner
Tom Polak t...@rockfordarearealtors.org wrote:
 
 I did not tweak anything after installing either system.
 
PostgreSQL is set up with defaults such that it will start up and
run on the most ancient an underpowered system people are likely to
have lying around.  It is expected that people will tune it for
serious production use, although people often run for years before
they hit a case where the tuning makes enough of a difference that
they do something about it.  For guidelines see this page:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 
You can get a quick comparison without doing any tuning, but it
won't tell you much about how something else compares to PostgreSQL
when it is configured for production use.
 
 The hardware it is running on is fairly good, dual Xeon CPUs, 4 GB
 of RAM, Raid 5.
 
For comparison, I would set shared_buffers to at least 200 MB,
effective_cache_size to 2 to 3 GB, and I would probably drop both
seq_page_cost and random_page_cost to 0.1, unless you actually
expect to be using a database large enough that the active portion
won't be cached.  (In that case, a test with tiny tables *really*
means nothing, though.)  There are other settings that will also
help.
 
 Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298)
 (actual time=0.036..4.679 rows=1001 loops=1)
 
 Total runtime: 5.128 ms
 
The 0.036 ms is how long it took to produce the first row of the
result once it started running, 4.679 ms is the total run time, and
5.128 includes miscellaneous other time, such as planning time.  Of
course, the EXPLAIN ANALYZE adds some overhead, so the actual run
time would normally be faster, and with tuning it might be still
faster.
 
 The general question comes down to, can I expect decent perfomance
 from Postgresql compared to MSSQL.
 
That has been my experience.  There's something about your runtime
environment which isn't playing well with PostgreSQL.  If it were
me, I would make sure that as little of my stack as possible
depended on products provided by anyone with an interest in seeing
PostgreSQL look bad compared to the alternative.  I can think of at
least one company with fourteen thousand reasons to do so.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Ivan Voras
2010/12/7 Віталій Тимчишин tiv...@gmail.com:


 2010/12/7 Robert Haas robertmh...@gmail.com

 On Tue, Dec 7, 2010 at 1:08 PM, Ivan Voras ivo...@freebsd.org wrote:

  I'm not very familiar with PostgreSQL code but if we're
  brainstorming... if you're only trying to protect against a small
  number of expensive operations (like DROP, etc.) that don't really
  happen often, wouldn't an atomic reference counter be good enough for
  the purpose (e.g. the expensive operations would spin-wait until the
  counter is 0)?

 No, because (1) busy-waiting is only suitable for locks that will only
 be held for a short time, and an AccessShareLock on a table might be
 held while we read 10GB of data in from disk, and (2) that wouldn't
 allow for deadlock detection.

 What can be done, is that number of locks can be increased - one could use
 spin locks for hash table manipulations, e.g. a lock preventing rehashing
 (number of baskets being changed) and a lock for required basket.
 In this case only small range of code can be protected by partition lock.
 As for me, this will make locking process more cpu-intensive (more locks
 will be acquired/freed during the exection), but will decrease contention
 (since all but one lock can be spin locks working on atomic counters, hash
 searches can be done in parallel), won't it?

For what it's worth, this is pretty much the opposite of what I had in
mind. I proposed atomic reference counters (as others pointed, this
probably won't work) as poor-man's shared-exclusive locks, so that
most operations would not have to contend on them.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Robert Haas
2010/12/7 Віталій Тимчишин tiv...@gmail.com:
 As far as I can see from the source, there is a lot of code executed under
 the partition lock protection, like two hash searches (and possibly
 allocations).

Yeah, that was my concern, too, though Tom seems skeptical (perhaps
rightly).  And I'm not really sure why the PROCLOCKs need to be in a
hash table anyway - if we know the PROC and LOCK we can surely look up
the PROCLOCK pretty expensively by following the PROC SHM_QUEUE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance under contention

2010-12-07 Thread Robert Haas
2010/12/7 Robert Haas robertmh...@gmail.com:
 2010/12/7 Віталій Тимчишин tiv...@gmail.com:
 As far as I can see from the source, there is a lot of code executed under
 the partition lock protection, like two hash searches (and possibly
 allocations).

 Yeah, that was my concern, too, though Tom seems skeptical (perhaps
 rightly).  And I'm not really sure why the PROCLOCKs need to be in a
 hash table anyway - if we know the PROC and LOCK we can surely look up
 the PROCLOCK pretty expensively by following the PROC SHM_QUEUE.

Err, pretty INexpensively.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance