Re: [PERFORM] choosing the right RAID level for PostgresQL database

2011-02-14 Thread david

On Sun, 13 Feb 2011, Dave Crooke wrote:


For any database, anywhere, the answer is pretty much always RAID-10.

The only time you would do anything else is for odd special cases.


there are two situations where you would opt for something other than 
RAID-10


1. if you need the space that raid 6 gives you compared to raid 10 you may 
not have much choice


2. if you do almost no updates to the disk during the time you are doing 
the reads then raid 6 can be at least as fast as raid 10 in non-degraded 
mode (it could be faster if you are able to use faster parts of the disks 
in raid 6 than you could in raid 10). degraded mode suffers more, but you 
can tolerate any 2 drives failing rather than just any 1 drive failing for 
raid 10 (the wrong two drives failing can kill a raid 10, while if the 
right drives fail you can loose a lot more drives in raid 10)


where raid 6 is significantly slower than raid 10 is when you are doing 
small random writes. Also many the performance variation between raid 
controllers will be much higher with raid 6 than with raid 10


David Lang


Cheers
Dave

On Sun, Feb 13, 2011 at 2:12 PM, sergey sergey.on@gmail.com wrote:


Hello,

I got a disk array appliance of 8 disks 1T each (UltraStor RS8IP4). It will
be used solely by PostgresQL database and I am trying to choose the best
RAID level for it.

The most priority is for read performance since we operate large data sets
(tables, indexes) and we do lots of searches/scans, joins and nested
queries. With the old disks that we have now the most slowdowns happen on
SELECTs.

Fault tolerance is less important, it can be 1 or 2 disks.

Space is the least important factor. Even 1T will be enough.

Which RAID level would you recommend in this situation. The current options
are 60, 50 and 10, but probably other options can be even better.

Thank you!






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to boost performance of queries containing pattern matching characters

2011-02-14 Thread Artur Zając
How can we boost performance of queries containing pattern matching 
characters?  In my case, we're using a percent sign (%) that matches any
string of  zero or more characters.

 QUERY:  DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'

 EMAIL column is VARCHAR(256).
 
 As it is clear from the above query, email is matched partially and
case-insensitively, which my application requirement demands.  
 
 In case, if it were a full match, I could easily define a functional 
 INDEX on EMAIL column (lower(EMAIL)) and I could rewrite my DELETE where
criteria like lower(EMAIL) = 'someem...@domain.com'.
 
 MYTABLE currently contains 2 million records and grows consistently.

I had almost the same problem.
To resolve it, I created my own text search parser (myftscfg) which divides
text in column into three letters parts, for example:

someem...@domain.com is divided to som, ome,mee,eem,ema,mai,ail,il@,
l@d,@do,dom,oma,mai,ain,in.,n.c,.co,com

There should be also index on email column:

CREATE INDEX email _fts on mytable using gin
(to_tsvector('myftscfg'::regconfig, email))

Every query like email ilike '%domain.com%' should be rewrited to:

WHERE
to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND
to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND
to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND
to_tsvector('myftscfg',email) @@ to_tsquery('ain') AND
to_tsvector('myftscfg',email) @@ to_tsquery('in.') AND
to_tsvector('myftscfg',email) @@ to_tsquery('n.c') AND
to_tsvector('myftscfg',email) @@ to_tsquery('.co') AND
to_tsvector('myftscfg',email) @@ to_tsquery('com') AND email ILIKE
'%domain.com%';

Index is reducing number of records and clause email ILIKE '%domain.com%' is
selecting only valid records.

I didn't found better solution.

---
Artur Zajac


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to boost performance of queries containing pattern matching characters

2011-02-14 Thread Gnanakumar
 The closest you can do is something like Artur's option (or the pg_trgm 
 module - handy since you are looking at 3-chars and up) to select likely 
 matches combined with a separate search on '%domain.com%' to confirm 
 that fact.

Thanks for your suggestion.  Our production server is currently running
PostgreSQL v8.2.3.  I think pg_trgm contrib module is not available for 8.2
series. 

Also, I read about WildSpeed - fast wildcard search for LIKE operator.  What
is your opinion on that?
http://www.sai.msu.su/~megera/wiki/wildspeed


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to boost performance of queries containing pattern matching characters

2011-02-14 Thread Artur Zając
 Looks like you've almost re-invented the trigram module:
   http://www.postgresql.org/docs/9.0/static/pgtrgm.html

I didn't know about this module.
Idea to use three letters strings and use Full Text Search is the same, but
the rest is not.

Is the idea to use similarity for this problem is really correct? How should
be query constructed to return really all records matching ILIKE criteria?


---
Artur Zajac


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance issue in the fields.

2011-02-14 Thread dba


create table a( address1 int,address2 int,address3 int)
create table b(address int[3])

I have created two tables. In the first table i am using many fields to
store 3 address. 
as well as in b table, i am using array data type to store 3 address.  is
there any issue would face in performance related things which one will
cause the performance issue.
-- 
View this message in context: 
http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance issue in the fields.

2011-02-14 Thread Pavel Stehule
Hello

2011/2/14 dba dba...@gmail.com:


 create table a( address1 int,address2 int,address3 int)
 create table b(address int[3])

 I have created two tables. In the first table i am using many fields to
 store 3 address.
 as well as in b table, i am using array data type to store 3 address.  is
 there any issue would face in performance related things which one will
 cause the performance issue.

yes, there is. Planner can not to work well with foreign keys stored in array.

Regards

Pavel Stehule

 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Field wise checking the performance.

2011-02-14 Thread dba


I have two identical tables. But the with of the fields are different. Need
to know whether changing from varchar(100) to varchar(30) will increase the
performance, or its just the memory access.
-- 
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Field-wise-checking-the-performance-tp3384348p3384348.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Field wise checking the performance.

2011-02-14 Thread Heikki Linnakangas

On 14.02.2011 14:06, dba wrote:

I have two identical tables. But the with of the fields are different. Need
to know whether changing from varchar(100) to varchar(30) will increase the
performance, or its just the memory access.


It will make no difference. The max length is just a constraint on what 
values can be stored, it doesn't affect how the strings are stored. In 
both cases, the strings are stored in a variable-length format.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to boost performance of queries containing pattern matching characters

2011-02-14 Thread Shaun Thomas

On 02/14/2011 12:59 AM, Gnanakumar wrote:


QUERY:  DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'
EMAIL column is VARCHAR(256).


Honestly? You'd be better off normalizing this column and maybe hiding 
that fact in a view if your app requires email as a single column. Split 
it like this:


So u...@gmail.com becomes:

email_acct (user)
email_domain (gmail)
email_tld (com)

This would let you drop the first % on your like match and then 
traditional indexes would work just fine. You could also differentiate 
between domains with different TLDs without using wildcards, which is 
always faster.


I might ask why you are checking email for wildcards after the TLD in 
the first place. Is it really so common you are trying to match .com, 
.com.au, .com.bar.baz.edu, or whatever? At the very least, splitting the 
account from the domain+tld would be beneficial, as it would remove the 
necessity of the first wildcard, which is really what's hurting you.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to boost performance of queries containing pattern matching characters

2011-02-14 Thread Greg Smith

Gnanakumar wrote:

Thanks for your suggestion.  Our production server is currently running
PostgreSQL v8.2.3.  I think pg_trgm contrib module is not available for 8.2
series. 
  


You're going to find that most of the useful answers here will not work 
on 8.2.  Full-text search was not fully integrated into the database 
until 8.3.  Trying to run an app using it on 8.2 is going to be a 
constant headache for you.  Also, moving from 8.2 to 8.3 is just a 
general performance boost in many ways.




Also, I read about WildSpeed - fast wildcard search for LIKE operator.  What
is your opinion on that?
http://www.sai.msu.su/~megera/wiki/wildspeed
  


WildSpeed works fine if you can handle the massive disk space and 
maintenance overhead it introduces.  I consider it useful only for data 
sets that are almost static, where you can afford to build its large 
index structure once and then use it to accelerate reads continuously, 
with minimal updates.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to boost performance of queries containing pattern matching characters

2011-02-14 Thread Steve Atkins

On Feb 14, 2011, at 12:09 AM, Artur Zając wrote:

 Looks like you've almost re-invented the trigram module:
  http://www.postgresql.org/docs/9.0/static/pgtrgm.html
 
 I didn't know about this module.
 Idea to use three letters strings and use Full Text Search is the same, but
 the rest is not.
 
 Is the idea to use similarity for this problem is really correct? How should
 be query constructed to return really all records matching ILIKE criteria?

If what you really want is the ability to select email addresses based on
subdomain, you might want to do this instead:

create email_domain_idx on mytable (reverse(lower(split_part(email, '@', 2;

Then you can do things like this ...

delete from mytable where reverse(lower(split_part(email, '@', 2))) = 
reverse('aol.com');

... to delete all aol.com users or like this ...

delete from mytable where reverse(lower(split_part(email, '@', 2))) like 
reverse('%.aol.com');

... to delete all email addresses that are in a subdomain of aol.com.

You need a reverse() function to do that. Here's one in plpgsql:

CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
   original alias for $1;
   reverse_str text;
   i int4;
BEGIN
 reverse_str = ;
 FOR i IN REVERSE LENGTH(original)..1 LOOP
  reverse_str = reverse_str || substr(original,i,1);
 END LOOP;
 return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;

(Normalizing the email address so that you store local part and domain part 
separately is even better, but an index on the reverse of the domain is still 
useful for working with subdomains).

Cheers,
  Steve


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance