Re: [PERFORM] numerical primary key vs alphanumerical primary key

2013-02-12 Thread Grant Johnson
My experience has been that the performance advantage for numeric keys is 
primarily an Oracle thing.  However, Oracle is popular enough for people to 
assume that it applies to databases in general.

Julien Cigar jci...@ulb.ac.be wrote:

The biggest difference in performance between text and integer keys is 
usually down to whether you're inserting in order or not. Inserting in 
order is tons faster regardless of the type, since it keeps the index 
unfragmented and doesn't cause page splits.

On 02/04/2013 22:52, Anne Rosset wrote:

 Hi,

 I have read a lot of different information about the benefits of
using 
 numerical primary key vs alphanumerical primary key(small size). And 
 what I am gathering is that for performance there is no more great 
 advantage.

 It seems like now RDBMS in general, postgres in particular handles 
 pretty well joins on text indexes.

 Did I understand correctly?

 Thanks,

 Anne


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-04 Thread Grant Johnson



Yes.  And this has little to do with hints.  It has to do with years
of development lead with THOUSANDS of engineers who can work on the
most esoteric corner cases in their spare time.  Find the pg project a
couple hundred software engineers and maybe we'll catch Oracle a
little quicker.  Otherwise we'll have to marshall our resources to do
the best we can on the project ,and that means avoiding maintenance
black holes and having the devs work on the things that give the most
benefit for the cost.  Hints are something only a tiny percentage of
users could actually use and use well.

Write a check, hire some developers and get the code done and present
it to the community.  If it's good and works it'll likely get
accepted.  Or use EDB, since it has oracle compatibility in it.

I have to disagree with you here.   I have never seen Oracle outperform 
PostgreSQL on complex joins, which is where the planner comes in.  
Perhaps on certain throughput things, but this is likely do to how we 
handle dead rows, and counts, which is definitely because of how dead 
rows are handled, but the easier maintenance makes up for those.  Also 
both of those are by a small percentage.


I have many times had Oracle queries that never finish (OK maybe not 
never, but not over a long weekend) on large hardware, but can be 
finished on PostgreSQL in a matter or minutes on cheap hardware.   This 
happens to the point that often I have set up a PostgreSQL database to 
copy the data to for querying and runnign the complex reports, even 
though the origin of the data was Oracle, since the application was 
Oracle specific.   It took less time to duplicate the database and run 
the query on PostgreSQL than it did to just run it on Oracle.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Grant Johnson



On PostgreSQL, the difference in no hints and hints for that one query 
with skewed data is that the query finishes a little faster.   On some 
others, which shall remain nameless, it is the difference between 
finishing in seconds or days, or maybe never.  Hints can be useful, but 
I can also see why they are not a top priority.  They are rarely needed, 
and only when working around a bug.  If you want them so badly, you have 
the source, write a contrib module  (can you do that on Oracle or 
MSSQL?)  If I have a choice between the developers spending time on 
implementing hints, and spending time on improving the optimiser, I'll 
take the optimiser.


Tom Kyte agrees:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0P11_QUESTION_ID:8912905298920
http://tkyte.blogspot.com/2006/08/words-of-wisdom.html



Oracle can be faster on count queries, but that is the only case I have 
seen.   Generally on most other queries, especially when it involves 
complex joins, or indexes on text fields, PostgreSQL is faster on the 
same hardware.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any experience using shake defragmenter?

2011-02-01 Thread Grant Johnson



Instead of something like 'shake' (which more or less works, even
though it doesn't use fallocate and friends) I frequently use either
CLUSTER (which is what Greg Smith is suggesting) or a series of ALTER
TABLE ... ALTER COLUMN... which rewrites the table.  With PG 9 perhaps
VACUUM FULL is more appropriate.  Of course, the advice regarding
using 'shake' (or any other defragmenter) on a live postgresql data
directory is excellent - the potential for causing damage if the
database is active during that time is very high.

   
I agree that unless it makes sure there are no open file handles before 
moving the file, there is a high chance of corrupting data, and if it 
does check, there is little chance it will do anything useful on a live 
DB, since it will skip every open file.


Does vacuum full rewrite the whole table, or only the blocks with free 
space?   If it only rewrites the blocks with free space, the only 
solution may be exclusive table lock, alter table to new name, create 
old table name as select * from new table name.  I also like the cluster 
idea, but I am not sure if it rewrites everything, or just the blocks 
that have out of order rows, in which case, it would not work well the 
second time.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Grant Johnson



Putting the WAL on a second controller does help, if you're write-heavy.

I tried separating indexes and data once on one server and didn't 
really notice that it helped much. Managing the space was problematic. 
I would suggest putting those together on a single RAID-10 of all the 
300GB drives (minus a spare). It will probably outperform separate 
arrays most of the time, and be much easier to manage.


--


I like to use RAID 1, and let LVM do the striping.   That way I can add 
more drives later too.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance