Re: [PERFORM] choosing the right RAID level for PostgresQL database
On Sun, 13 Feb 2011, Dave Crooke wrote: For any database, anywhere, the answer is pretty much always RAID-10. The only time you would do anything else is for odd special cases. there are two situations where you would opt for something other than RAID-10 1. if you need the space that raid 6 gives you compared to raid 10 you may not have much choice 2. if you do almost no updates to the disk during the time you are doing the reads then raid 6 can be at least as fast as raid 10 in non-degraded mode (it could be faster if you are able to use faster parts of the disks in raid 6 than you could in raid 10). degraded mode suffers more, but you can tolerate any 2 drives failing rather than just any 1 drive failing for raid 10 (the wrong two drives failing can kill a raid 10, while if the right drives fail you can loose a lot more drives in raid 10) where raid 6 is significantly slower than raid 10 is when you are doing small random writes. Also many the performance variation between raid controllers will be much higher with raid 6 than with raid 10 David Lang Cheers Dave On Sun, Feb 13, 2011 at 2:12 PM, sergey sergey.on@gmail.com wrote: Hello, I got a disk array appliance of 8 disks 1T each (UltraStor RS8IP4). It will be used solely by PostgresQL database and I am trying to choose the best RAID level for it. The most priority is for read performance since we operate large data sets (tables, indexes) and we do lots of searches/scans, joins and nested queries. With the old disks that we have now the most slowdowns happen on SELECTs. Fault tolerance is less important, it can be 1 or 2 disks. Space is the least important factor. Even 1T will be enough. Which RAID level would you recommend in this situation. The current options are 60, 50 and 10, but probably other options can be even better. Thank you! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
How can we boost performance of queries containing pattern matching characters? In my case, we're using a percent sign (%) that matches any string of zero or more characters. QUERY: DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%' EMAIL column is VARCHAR(256). As it is clear from the above query, email is matched partially and case-insensitively, which my application requirement demands. In case, if it were a full match, I could easily define a functional INDEX on EMAIL column (lower(EMAIL)) and I could rewrite my DELETE where criteria like lower(EMAIL) = 'someem...@domain.com'. MYTABLE currently contains 2 million records and grows consistently. I had almost the same problem. To resolve it, I created my own text search parser (myftscfg) which divides text in column into three letters parts, for example: someem...@domain.com is divided to som, ome,mee,eem,ema,mai,ail,il@, l@d,@do,dom,oma,mai,ain,in.,n.c,.co,com There should be also index on email column: CREATE INDEX email _fts on mytable using gin (to_tsvector('myftscfg'::regconfig, email)) Every query like email ilike '%domain.com%' should be rewrited to: WHERE to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND to_tsvector('myftscfg',email) @@ to_tsquery('ain') AND to_tsvector('myftscfg',email) @@ to_tsquery('in.') AND to_tsvector('myftscfg',email) @@ to_tsquery('n.c') AND to_tsvector('myftscfg',email) @@ to_tsquery('.co') AND to_tsvector('myftscfg',email) @@ to_tsquery('com') AND email ILIKE '%domain.com%'; Index is reducing number of records and clause email ILIKE '%domain.com%' is selecting only valid records. I didn't found better solution. --- Artur Zajac -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
The closest you can do is something like Artur's option (or the pg_trgm module - handy since you are looking at 3-chars and up) to select likely matches combined with a separate search on '%domain.com%' to confirm that fact. Thanks for your suggestion. Our production server is currently running PostgreSQL v8.2.3. I think pg_trgm contrib module is not available for 8.2 series. Also, I read about WildSpeed - fast wildcard search for LIKE operator. What is your opinion on that? http://www.sai.msu.su/~megera/wiki/wildspeed -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
Looks like you've almost re-invented the trigram module: http://www.postgresql.org/docs/9.0/static/pgtrgm.html I didn't know about this module. Idea to use three letters strings and use Full Text Search is the same, but the rest is not. Is the idea to use similarity for this problem is really correct? How should be query constructed to return really all records matching ILIKE criteria? --- Artur Zajac -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] performance issue in the fields.
create table a( address1 int,address2 int,address3 int) create table b(address int[3]) I have created two tables. In the first table i am using many fields to store 3 address. as well as in b table, i am using array data type to store 3 address. is there any issue would face in performance related things which one will cause the performance issue. -- View this message in context: http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance issue in the fields.
Hello 2011/2/14 dba dba...@gmail.com: create table a( address1 int,address2 int,address3 int) create table b(address int[3]) I have created two tables. In the first table i am using many fields to store 3 address. as well as in b table, i am using array data type to store 3 address. is there any issue would face in performance related things which one will cause the performance issue. yes, there is. Planner can not to work well with foreign keys stored in array. Regards Pavel Stehule -- View this message in context: http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Field wise checking the performance.
I have two identical tables. But the with of the fields are different. Need to know whether changing from varchar(100) to varchar(30) will increase the performance, or its just the memory access. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Field-wise-checking-the-performance-tp3384348p3384348.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Field wise checking the performance.
On 14.02.2011 14:06, dba wrote: I have two identical tables. But the with of the fields are different. Need to know whether changing from varchar(100) to varchar(30) will increase the performance, or its just the memory access. It will make no difference. The max length is just a constraint on what values can be stored, it doesn't affect how the strings are stored. In both cases, the strings are stored in a variable-length format. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
On 02/14/2011 12:59 AM, Gnanakumar wrote: QUERY: DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%' EMAIL column is VARCHAR(256). Honestly? You'd be better off normalizing this column and maybe hiding that fact in a view if your app requires email as a single column. Split it like this: So u...@gmail.com becomes: email_acct (user) email_domain (gmail) email_tld (com) This would let you drop the first % on your like match and then traditional indexes would work just fine. You could also differentiate between domains with different TLDs without using wildcards, which is always faster. I might ask why you are checking email for wildcards after the TLD in the first place. Is it really so common you are trying to match .com, .com.au, .com.bar.baz.edu, or whatever? At the very least, splitting the account from the domain+tld would be beneficial, as it would remove the necessity of the first wildcard, which is really what's hurting you. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
Gnanakumar wrote: Thanks for your suggestion. Our production server is currently running PostgreSQL v8.2.3. I think pg_trgm contrib module is not available for 8.2 series. You're going to find that most of the useful answers here will not work on 8.2. Full-text search was not fully integrated into the database until 8.3. Trying to run an app using it on 8.2 is going to be a constant headache for you. Also, moving from 8.2 to 8.3 is just a general performance boost in many ways. Also, I read about WildSpeed - fast wildcard search for LIKE operator. What is your opinion on that? http://www.sai.msu.su/~megera/wiki/wildspeed WildSpeed works fine if you can handle the massive disk space and maintenance overhead it introduces. I consider it useful only for data sets that are almost static, where you can afford to build its large index structure once and then use it to accelerate reads continuously, with minimal updates. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to boost performance of queries containing pattern matching characters
On Feb 14, 2011, at 12:09 AM, Artur Zając wrote: Looks like you've almost re-invented the trigram module: http://www.postgresql.org/docs/9.0/static/pgtrgm.html I didn't know about this module. Idea to use three letters strings and use Full Text Search is the same, but the rest is not. Is the idea to use similarity for this problem is really correct? How should be query constructed to return really all records matching ILIKE criteria? If what you really want is the ability to select email addresses based on subdomain, you might want to do this instead: create email_domain_idx on mytable (reverse(lower(split_part(email, '@', 2; Then you can do things like this ... delete from mytable where reverse(lower(split_part(email, '@', 2))) = reverse('aol.com'); ... to delete all aol.com users or like this ... delete from mytable where reverse(lower(split_part(email, '@', 2))) like reverse('%.aol.com'); ... to delete all email addresses that are in a subdomain of aol.com. You need a reverse() function to do that. Here's one in plpgsql: CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS ' DECLARE original alias for $1; reverse_str text; i int4; BEGIN reverse_str = ; FOR i IN REVERSE LENGTH(original)..1 LOOP reverse_str = reverse_str || substr(original,i,1); END LOOP; return reverse_str; END;' LANGUAGE 'plpgsql' IMMUTABLE; (Normalizing the email address so that you store local part and domain part separately is even better, but an index on the reverse of the domain is still useful for working with subdomains). Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance