Re: [PERFORM] [ADMIN] Messed up time zones

2012-08-03 Thread Laszlo Nagy


So you took two distinct points in time, threw away some critical 
information, and are surprised why they are now equal?
Well, I did not want to throw away any information. The actual 
representation could be something like:


"2012-11-04 01:30:00-08 in Europe/Budapest, Winter time"

and

"2012-11-04 01:30:00-08 in Europe/Budapest, Summer time".

It would be unambiguous, everybody would know the time zone, the UTC 
offset and the time value, and conversion back to UTC would be 
unambiguous too.


I presumed that the representation is like that. But I was wrong. I have 
checked other programming languages. As it turns out, nobody wants to 
change the representation just because there can be an ambiguous hour in 
every year. Now I think that most systems treat ambiguous time stamps as 
if they were in standard time. And who am I to go against the main flow? 
I'm sorry, I admit that the problem was in my head.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] Messed up time zones

2012-08-03 Thread Laszlo Nagy


All the above are the exact same point in time merely stated as 
relevant to each location. Note that given a timestamp with time zone 
and a zone, PostgreSQL returns a timestamp without time zone (you know 
the zone since you specified it). 

Yes, I know the zone. But I don't know the offset from UTC.

Example:

template1=> set timezone to 'UTC';
SET
template1=> select ('2011-10-30 01:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time?
(1 row)

template1=> select ('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time? What is the 
offset from UTC here? Can you tell me when it was in UTC?

(1 row)

template1=>

What is more:

template1=> select (('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest') is distinct from (('2011-10-30 
01:00:00'::timestamptz) at time zone 'Europe/Budapest');

 ?column?
--
 f
(1 row)

template1=>

Yeah, we know what time zone it is in, but we don't know when it was, 
thanks a lot. :-( It would be unambiguous to store the UTC offset along 
with the value. But it is not how it was implemented.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ZFS vs. UFS

2012-07-31 Thread Laszlo Nagy


When Intel RAID controller is that?  All of the ones on the 
motherboard are pretty much useless if that's what you have. Those are 
slower than software RAID and it's going to add driver issues you 
could otherwise avoid.  Better to connect the drives to the non-RAID 
ports or configure the controller in JBOD mode first.


Using one of the better RAID controllers, one of Dell's good PERC 
models for example, is one of the biggest hardware upgrades you could 
make to this server.  If your database is mostly read traffic, it 
won't matter very much.  Write-heavy loads really benefit from a good 
RAID controller's write cache.

Actually, it is a PERC with write-cache and BBU.


ZFS will heavily use server RAM for caching by default, much more so 
than UFS.  Make sure you check into that, and leave enough RAM for the 
database to run too.  (Doing *some* caching that way is good for 
Postgres; you just don't want *all* the memory to be used for that)
Right now, the size of the database is below 5GB. So I guess it will fit 
into memory. I'm concerned about data safety and availability. I have 
been in a situation where the RAID card went wrong and I was not able to 
recover the data because I could not get an identical RAID card in time. 
I have also been in a situation where the system was crashing two times 
a day, and we didn't know why. (As it turned out, it was a bug in the 
"stable" kernel and we could not identify this for two weeks.) However, 
we had to do fsck after every crash. With a 10TB disk array, it was 
extremely painful. ZFS is much better: short recovery time and it is 
RAID card independent. So I think I have answered my own question - I'm 
going to use ZFS to have better availability, even if it leads to poor 
performance. (That was the original question: how bad it it to use ZFS 
for PostgreSQL, instead of the native UFS.)


Moving disks to another server is a very low probability fix for a 
broken system.  The disks are a likely place for the actual failure to 
happen at in the first place.
Yes, but we don't have to worry about that. raidz2 + hot spare is safe 
enough. The RAID card is the only single point of failure.
I like to think more in terms of "how can I create a real-time replica 
of this data?" to protect databases, and the standby server for that 
doesn't need to be an expensive system.  That said, there is no reason 
to set things up so that they only work with that Intel RAID 
controller, given that it's not a very good piece of hardware anyway.
I'm not sure how to create a real-time replica. This database is updated 
frequently. There is always a process that reads/writes into the 
database. I was thinking about using slony to create slave databases. I 
have no experience with that. We have a 100Mbit connection. I'm not sure 
how much bandwidth we need to maintain a real-time slave database. It 
might be a good idea.


I'm sorry, I feel I'm being off-topic.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ZFS vs. UFS

2012-07-24 Thread Laszlo Nagy



On 24.07.2012 14:51, Laszlo Nagy wrote:


  * UFS is not journaled.


There is journal support for UFS as far as i know. Please have a look 
at the gjournal manpage.
Yes, but gjournal works for disk devices. I would have rely on the hw 
card for RAID. When the card goes wrong I won't be able to access my data.


I could also buy an identical RAID card. In fact I could buy a complete 
backup server. But right now I don't have the money for that. So I would 
like to use a solution that allows me to recover from a failure even if 
the RAID card goes wrong.


It might also be possible to combine gmirror + gjournal, but that is not 
good enough. Performance and stability of a simple gmirror with two 
disks is much worse then a raidz array with 10 disks (and hot spare), or 
even a raid 1+0 (and hot spare) that is supported by the hw RAID card.


So I would like to stick with UFS+hw card support (and then I need to 
buy an identical RAID card if I can), or ZFS.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ZFS vs. UFS

2012-07-24 Thread Laszlo Nagy



> I wonder if UFS has better performance or not. Or can you suggest
> another fs? Just of the PGDATA directory.


Relying on physically moving a disk isn't a good backup/recovery 
strategy.  Disks are the least reliable single component in a modern 
computer.  You should figure out the best file system for your 
application, and separately figure out a recovery strategy, one that 
can survive the failure of *any* component in your system, including 
the disk itself.
This is why I use a RAID array of 10 disks. So there is no single point 
of failure. What else could I do? (Yes, I can make regular backups, but 
that is not the same. I can still loose data...)


[PERFORM] ZFS vs. UFS

2012-07-24 Thread Laszlo Nagy


  Hello,

Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell 
PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.)


 * ZFS is journaled, and it is more independent of the hardware. So if
   the computer goes wrong, I can move the zfs array to a different server.
 * UFS is not journaled. Also I have to rely on the RAID card to build
   the RAID array. If there is a hw problem with it, then I won't be
   able to recover the data easily.

I wonder if UFS has better performance or not. Or can you suggest 
another fs? Just of the PGDATA directory.


Thanks,

   Laszlo



Re: [PERFORM] queries are fast after dump->restore but slow again after some days dispite vacuum

2012-07-20 Thread Laszlo Nagy

Are you running a lot of full table updates?

If you mean updates which are applied on every or almost every row of
the table - yes, it happens with two rather small tables of max. 10
000 rows. But they are both not touched by the query with this big
performance difference.
I'm not an expert, but would it help to change fillfactor to about 45%? 
I'm just guessing that full table updates with fillfactor=45% could 
store the rows on the same page. Maybe I'm wrong.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why it is using/not using index scan?

2011-03-31 Thread Laszlo Nagy

For this query:

select pp.id,pp.product_id,pp.selling_site_id,pp.asin
from product_price pp
where
(pp.asin is not null and pp.asin<>'')
and (pp.upload_status_id<>1)
and pp.selling_site_id in (8,7,35,6,9)
and (pp.last_od < 'now'::timestamp - '1 week'::interval )
limit 5000

Query plan is:

"Limit  (cost=9182.41..77384.80 rows=3290 width=35)"
"  ->  Bitmap Heap Scan on product_price pp  (cost=9182.41..77384.80 
rows=3290 width=35)"
"Recheck Cond: ((last_od < '2011-03-24 
13:05:09.540025'::timestamp without time zone) AND (selling_site_id = 
ANY ('{8,7,35,6,9}'::bigint[])))"
"Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND 
(upload_status_id <> 1))"
"->  Bitmap Index Scan on idx_product_price_last_od_ss  
(cost=0.00..9181.59 rows=24666 width=0)"
"  Index Cond: ((last_od < '2011-03-24 
13:05:09.540025'::timestamp without time zone) AND (selling_site_id = 
ANY ('{8,7,35,6,9}'::bigint[])))"


For this query:

select pp.id,pp.product_id,pp.selling_site_id,pp.asin
from product_price pp
where
(pp.asin is not null and pp.asin<>'')
and (pp.upload_status_id<>1)
and pp.selling_site_id in (8,7,35,6,9)
and (pp.last_od + '1 week'::interval < 'now'::timestamp )
limit 5000

Query plan is:

"Limit  (cost=0.00..13890.67 rows=5000 width=35)"
"  ->  Seq Scan on product_price pp  (cost=0.00..485889.97 rows=174898 
width=35)"
"Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND 
(upload_status_id <> 1) AND ((last_od + '7 days'::interval) < 
'2011-03-31 13:06:17.460013'::timestamp without time zone) AND 
(selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))"



The only difference is this: instead of (pp.last_od < 'now'::timestamp - 
'1 week'::interval ) I have used (pp.last_od + '1 week'::interval < 
'now'::timestamp )


First query with index scan opens in 440msec. The second query with seq 
scan opens in about 22 seconds. So the first one is about 50x faster.


My concern is that we are working on a huge set of applications that use 
thousands of different queries on a database. There are programs that we 
wrote years ago. The database structure continuously changing. We are 
adding new indexes and columns, and of course we are upgrading 
PostgreSQL when a new stable version comes out. There are cases when a 
change in a table affects 500+ queries in 50+ programs. I really did not 
think that I have to be THAT CAREFUL with writing conditions in SQL. Do 
I really have to manually analyze all those queries and "correct" 
conditions like this?


If so, then at least I would like to know if there is a documentation or 
wiki page where I can learn about "how not to write conditions". I just 
figured out that I need to put constant expressions on one side of any 
comparison, if possible. But probably there are other rules I wouldn't 
think of.


Might it be possible to change the optimizer so that it tries to rally 
constant expressions in the first place? That cannot be bad, right?


Thanks,

   Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query on CLUTER -ed tables

2011-03-25 Thread Laszlo Nagy



I suspect that, since the matched hid's probably aren't sequential,
many of those ~500 product_price_offer_history rows will be far apart
on disk.
OMG I was a fool! I'll CLUSTER on a different index and it will be fast, 
I'm sure.


Thanks!

   L


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query on CLUTER -ed tables

2011-03-23 Thread Laszlo Nagy

Given two tables:

CREATE TABLE product_price_history
(
  hid bigint NOT NULL,
  hdate timestamp without time zone NOT NULL,
  id bigint NOT NULL,
  product_id bigint NOT NULL,
 more columns here
  CONSTRAINT pk_product_price_history PRIMARY KEY (hid);

CREATE INDEX idx_product_price_history_id_hdate
  ON product_price_history
  USING btree
  (id, hdate);


CREATE TABLE product_price_offer_history
(
  hid bigint NOT NULL,
  product_price_id bigint NOT NULL,
  isfeatured smallint NOT NULL,
  price double precision NOT NULL,
  shipping double precision NOT NULL,
 some more coumns here
  CONSTRAINT pk_product_price_offer_history PRIMARY KEY (hid, offerno)
);

Stats:

product_price_history - tablesize=23GB, indexes size=4GB, row count = 87 
million
product_price_offer_history - tablesize=24GB, indexes size=7GB, row 
count = 235 million



These tables store historical data of some million products from the 
last year.

The following commands are executed on them daily:

CLUSTER idx_product_price_history_id_hdate on product_price_history;
CLUSTER pk_product_price_offer_history on product_price_offer_history;

Here is a query:

select
  date_part('epoch', min(pph.hdate) )  as hdate_ticks,
  min(ppoh.price+ppoh.shipping) as price_plus_shipping
from
  product_price_history pph
  inner join product_price_offer_history ppoh on ppoh.hid = pph.hid
where pph.id = 37632081
 and ppoh.isfeatured=1
group by ppoh.merchantid,pph.hid,pph.hdate
order by pph.hid asc


I think that the query plan is correct:


"GroupAggregate  (cost=5553554.25..5644888.17 rows=2283348 width=50)"
"  ->  Sort  (cost=5553554.25..5559262.62 rows=2283348 width=50)"
"Sort Key: pph.hid, ppoh.merchantid, pph.hdate"
"->  Nested Loop  (cost=0.00..5312401.66 rows=2283348 width=50)"
"  ->  Index Scan using idx_product_price_history_id_hdate 
on product_price_history pph  (cost=0.00..8279.80 rows=4588 width=16)"

"Index Cond: (id = 37632081)"
"  ->  Index Scan using pk_product_price_offer_history on 
product_price_offer_history ppoh  (cost=0.00..1149.86 rows=498 width=42)"

"Index Cond: (ppoh.hid = pph.hid)"
"Filter: (ppoh.isfeatured = 1)"

So it uses two index scans on the indexes we CLUSTER the tables on. 
Number of rows returned is usually between 100 and 20 000.



Here is the problem. When I first open this query for a given 
identifier, it runs for 100 seconds. When I try to run it again for the 
same identifier it returns the same rows within one second!


The indexes are very well conditioned: from the 235 million rows, any id 
given occurs at most 20 000 times. It is a btree index, so it should 
already be stored sorted, and the 20 000 rows to be returned should fit 
into a few database pages. Even if they are not in the cache, PostgreSQL 
should be able to read the required pages within a second.


I understand that for an index scan, PostgreSQL also needs to read the 
rows from the table. But since these tables are CLUSTER-ed on those 
specific indexes, all the data needed shoud fit on a few database pages 
and PostgreSQL should be able to read them within a second.


Then why it is taking 100 seconds to do the query for the first time and 
why it is just one sec for the second time? Probably my thinking is 
wrong, but I suppose it means that the data is spread on thousands of 
pages on the disk.


How is that possible? What am I doing wrong?

Thanks,

   Laszlo


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad query plan when the wrong data type is used

2011-02-08 Thread Laszlo Nagy

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from  variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id = 
visa.variation_item_id

where visa.id =4

runs in 43 msec. The "visa.id" column has int4 datatype. The query plan 
uses an index condition:


"Nested Loop  (cost=0.00..26.19 rows=1 width=28)"
"  ->  Nested Loop  (cost=0.00..17.75 rows=1 width=24)"
"->  Index Scan using variation_item_sellingsite_asin_pkey on 
variation_item_sellingsite_asin visa  (cost=0.00..8.58 rows=1 width=16)"

"  Index Cond: (id = 4)"
"->  Index Scan using pk_product_id on product p  
(cost=0.00..9.16 rows=1 width=16)"

"  Index Cond: (p.id = visa.product_id)"
"  ->  Index Scan using pk_variation_item_id on variation_item vi  
(cost=0.00..8.43 rows=1 width=12)"

"Index Cond: (vi.id = visa.variation_item_id)"


This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from  variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id = 
visa.variation_item_id

where visa.id =4.0

Runs for  1144 msec! Query plan uses seq scan + filter:

"Nested Loop  (cost=33957.27..226162.68 rows=14374 width=28)"
"  ->  Hash Join  (cost=33957.27..106190.76 rows=14374 width=20)"
"Hash Cond: (visa.variation_item_id = vi.id)"
"->  Seq Scan on variation_item_sellingsite_asin visa  
(cost=0.00..71928.04 rows=14374 width=16)"

"  Filter: ((id)::numeric = 4.0)"
"->  Hash  (cost=22026.01..22026.01 rows=954501 width=12)"
"  ->  Seq Scan on variation_item vi  (cost=0.00..22026.01 
rows=954501 width=12)"
"  ->  Index Scan using pk_product_id on product p  (cost=0.00..8.33 
rows=1 width=16)"

"Index Cond: (p.id = visa.product_id)"


Which is silly. I think that PostgreSQL converts the int side to a 
float, and then compares them.


It would be better to do this, for each item in the loop:

   * evaluate the right side (which is float)
   * tell if it is an integer or not
   * if not an integer, then discard the row immediately
   * otherwise use its integer value for the index scan

The result is identical, but it makes possible to use the index scan. Of 
course, I know that the query itself is wrong, because I sould not use a 
float where an int is expected. But this CAN be optimized, so I think it 
should be! My idea for the query optimizer is not to use the "wider" 
data type, but use the data type that has an index on it instead.


(I spent an hour figuring out what is wrong with my program. In some 
cases it was slow, in other cases it was really fast, and I never got an 
error message.)


What do you think?

   Laszlo



[PERFORM] Get master-detail relationship metadata

2011-02-03 Thread Laszlo Nagy


  Hi All,

I'm working on a client program that iterates over master-detail 
relationships in a loop chain.


Pseudo code:

for row_1 in table_1:
table_2 = get_details(row_1,"table2")
for row_2 in table_2:
row_3 = get_details(row_2,"table3")
 etc.
process_data(row1,row_2,row_3,)

My task is to write the "get_details" iterator effectively. The obvious 
way to do it is to query details in every get_details() call, but that 
is not efficient. We have relationships where one master only has a few 
details. For 1 million master rows, that would result in execution of 
millions of SQL SELECT commands, degrading the performance by 
magnitudes. My idea was that the iterator should pre-fetch and cache 
data for many master records at once. The get_details() would use the 
cached rows, thus reducing the number of SQL SELECT statements needed. 
Actually I wrote the iterator, and it works fine in some cases. For example:


producers = get_rows("producer")
for producer in producers:
products = get_getails(producer,"product")
for product in products:
prices = get_details(product,"prices")
for price in prices:
process_product_price(producer,product,price)

This works fine if one producer has not more than 1000 products and one 
product has not more than 10 prices. I can easly keep 10 000 records in 
memory. The actual code executes about 15 SQL queries while iterating 
over 1 million rows. Compared to the original "obvious" method, 
performance is increased to 1500%


But sometimes it just doesn't work. If a producer has 1 million 
products, and one product has 100 prices, then it won't work, because I 
cannot keep 100 million prices in memory. My program should somehow 
figure out, how much rows it will get for one master, and select between 
the cached and not cached methods.


So here is the question: is there a way to get this information from 
PostgreSQL itself? I know that the query plan contains information about 
this, but I'm not sure how to extract. Should I run an ANALYZE command 
of some kind, and parse the result as a string? For example:


EXPLAIN select * from product where producer_id=1008;
  QUERY PLAN
--
 Seq Scan on product  (cost=0.00..1018914.74 rows=4727498 width=1400)
   Filter: (producer_id = 1008)
(2 rows)


Then I could extract "rows=4727498" to get an idea about how much detail 
rows I'll get for the master.


Is there any better way to do it? And how reliable is this?


Thanks,

   Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query + why bitmap index scan??

2011-01-12 Thread Laszlo Nagy

On 2011-01-12 15:36, Kevin Grittner wrote:

Laszlo Nagy  wrote:


shared_mem = 6GB
work_mem = 512MB
total system memory=24GB


In addition to the good advice from Ken, I suggest that you set
effective_cache_size (if you haven't already).  Add whatever the OS
shows as RAM used for cache to the shared_mem setting.
It was 1GB. Now I changed to 2GB. Although the OS shows 9GB inactive 
memory, we have many concurrent connections to the database server. I 
hope it is okay to use 2GB.


But yeah, for your immediate problem, if you can cluster the table
on the index involved, it will be much faster.  Of course, if the
table is already in a useful order for some other query, that might
get slower, and unlike some other products, CLUSTER in PostgreSQL
doesn't *maintain* that order for the data as new rows are added --
so this should probably become a weekly (or monthly or some such)
maintenance operation.
Thank you! After clustering, queries are really fast. I don't worry 
about other queries. This is the only way we use this table - get 
details for a given id value. I put the CLUSTER command into a cron 
script that runs daily. For the second time, it took 2 minutes to run so 
I guess it will be fine.


Thank you for your help.

   Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query + why bitmap index scan??

2011-01-12 Thread Laszlo Nagy

On 2011-01-12 14:42, Florian Weimer wrote:

* Laszlo Nagy:


This query:

select hid from product_price_history where id=35547581

Returns 759 rows in 8837 msec! How can this be that slow???

If most records are on different heap pages, processing this query
requires many seeks.  11ms per seek is not too bad if most of them are
cache misses.

How about this:

select id,hdate from product_price_history where id=35547581 -- 759 
rows, 8837 ms

Query time average: 3 sec.
Query plan:

"Bitmap Heap Scan on product_price_history  (cost=13.91..1871.34 
rows=474 width=16)"

"  Recheck Cond: (id = 35547582)"
"  ->  Bitmap Index Scan on idx_product_price_history_id_hdate  
(cost=0.00..13.79 rows=474 width=0)"

"Index Cond: (id = 35547582)"

Why still the heap scan here? All fields in the query are in the 
index... Wouldn't a simple index scan be faster? (This is only a 
theoretical question, just I'm curious.)


My first idea to speed things up is to cluster this table regularly. 
That would convert (most of the) rows into a few pages. Few page reads 
-> faster query. Is it a good idea?


Another question. Do you think that increasing shared_mem would make it 
faster?


Currently we have:

shared_mem = 6GB
work_mem = 512MB
total system memory=24GB

Total database size about 30GB, but there are other programs running on 
the system, and many other tables.


Thanks,

   Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query + why bitmap index scan??

2011-01-12 Thread Laszlo Nagy

This will be simple question to answer. :-) There is a single table:

select count(*) from product_price_history  -- 12982555 rows

This  table has exactly one index and on primary key constraint:

CREATE INDEX idx_product_price_history_id_hdate
  ON product_price_history
  USING btree
  (id, hdate);

ALTER TABLE product_price_history
  ADD CONSTRAINT pk_product_price_history PRIMARY KEY(hid);

No more constraints or indexes defined on this table. Rows are never 
updated or deleted in this table, they are only inserted. It was 
vacuum-ed and reindex-ed today.


Stats on the table:

seq scans=13, index scans=108, table size=3770MB, toast table size=8192 
bytes, indexes size=666MB


This query:

select hid from product_price_history where id=35547581

Returns 759 rows in 8837 msec! How can this be that slow???

The query plan is:

"Bitmap Heap Scan on product_price_history  (cost=13.90..1863.51 
rows=472 width=8)"

"  Recheck Cond: (id = 35547581)"
"  ->  Bitmap Index Scan on idx_product_price_history_id_hdate  
(cost=0.00..13.78 rows=472 width=0)"

"Index Cond: (id = 35547581)"

I don't understand why PostgreSQL uses bitmap heap scan + bitmap index 
scan? Why not just use an regular index scan? Data in a btree index is 
already sorted. A normal index scan should take no more than a few page 
reads. This sould never take 8 seconds.


Thanks,

   Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-15 Thread Laszlo Nagy



- Pg doesn't know the erase block sizes or positions. It can't group
writes up by erase block except by hoping that, within a given file,
writing in page order will get the blocks to the disk in roughly
erase-block order. So your write caching isn't going to do anywhere near
as good a job as the SSD's can.
  

Okay, I see. We cannot query erase block size from an SSD drive. :-(

I don't think that any SSD drive has more than some
megabytes of write cache.



The big, lots-of-$$ ones have HUGE battery backed caches for exactly
this reason.
  

Heh, this is why they are so expensive. :-)

The same amount of write cache could easily be
implemented in OS memory, and then Pg would always know what hit the disk.



Really? How does Pg know what order the SSD writes things out from its
cache?
  
I got the point. We cannot implement an efficient write cache without 
much more knowledge about how that particular drive works.


So... the only solution that works well is to have much more RAM for 
read cache, and much more RAM for write cache inside the RAID controller 
(with BBU).


Thank you,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-15 Thread Laszlo Nagy



A change has been written to the WAL and fsync()'d, so Pg knows it's hit
disk. It can now safely apply the change to the tables themselves, and
does so, calling fsync() to tell the drive containing the tables to
commit those changes to disk.

The drive lies, returning success for the fsync when it's just cached
the data in volatile memory. Pg carries on, shortly deleting the WAL
archive the changes were recorded in or recycling it and overwriting it
with new change data. The SSD is still merrily buffering data to write
cache, and hasn't got around to writing your particular change yet.
  
All right. I believe you. In the current Pg implementation, I need to 
turn of disk cache.


But I would like to ask some theoretical questions. It is just an 
idea from me, and probably I'm wrong.

Here is a scenario:

#1. user wants to change something, resulting in a write_to_disk(data) call
#2. data is written into the WAL and fsync()-ed
#3. at this point the write_to_disk(data) call CAN RETURN, the user can 
continue his work (the WAL is already written, changes cannot be lost)

#4. Pg can continue writting data onto the disk, and fsync() it.
#5. Then WAL archive data can be deleted.

Now maybe I'm wrong, but between #3 and #5, the data to be written is 
kept in memory. This is basically a write cache, implemented in OS 
memory. We could really handle it like a write cache. E.g. everything 
would remain the same, except that we add some latency. We can wait some 
time after the last modification of a given block, and then write it out.


Is it possible to do? If so, then can we can turn off write cache for 
all drives, except the one holding the WAL. And still write speed would 
remain the same. I don't think that any SSD drive has more than some 
megabytes of write cache. The same amount of write cache could easily be 
implemented in OS memory, and then Pg would always know what hit the disk.


Thanks,

  Laci


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-14 Thread Laszlo Nagy




   * I could buy two X25-E drives and have 32GB disk space, and some
 redundancy. This would cost about $1600, not counting the RAID
 controller. It is on the edge.
This was the solution I went with (4 drives in a raid 10 actually). 
Not a cheap solution, but the performance is amazing.


I've came across this article:

http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ 



It's from a Linux MySQL user so it's a bit confusing but it looks like 
he has some reservations about performance vs reliability of the Intel 
drives - apparently they have their own write cache and when it's 
disabled performance drops sharply.
Ok, I'm getting confused here. There is the WAL, which is written 
sequentially. If the WAL is not corrupted, then it can be replayed on 
next database startup. Please somebody enlighten me! In my mind, fsync 
is only needed for the WAL. If I could configure postgresql to put the 
WAL on a real hard drive that has BBU and write cache, then I cannot 
loose data. Meanwhile, product table data could be placed on the SSD 
drive, and I sould be able to turn on write cache safely. Am I wrong?


 L


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-14 Thread Laszlo Nagy

Robert Haas wrote:

2009/11/14 Laszlo Nagy :
  

32GB is for one table only. This server runs other applications, and you
need to leave space for sort memory, shared buffers etc. Buying 128GB memory
would solve the problem, maybe... but it is too expensive. And it is not
safe. Power out -> data loss.

I'm sorry I though he was talking about keeping the database in memory 
with fsync=off. Now I see he was only talking about the OS disk cache.


My server has 24GB RAM, and I cannot easily expand it unless I throw out 
some 2GB modules, and buy more 4GB or 8GB modules. But... buying 4x8GB 
ECC RAM (+throwing out 4x2GB RAM) is a lot more expensive than buying 
some 64GB SSD drives. 95% of the table in question is not modified. Only 
read (mostly with index scan). Only 5% is actively updated.


This is why I think, using SSD in my case would be effective.

Sorry for the confusion.

 L


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-14 Thread Laszlo Nagy

Heikki Linnakangas wrote:

Laszlo Nagy wrote:
  

   * I need at least 32GB disk space. So DRAM based SSD is not a real
 option. I would have to buy 8x4GB memory, costs a fortune. And
 then it would still not have redundancy.



At 32GB database size, I'd seriously consider just buying a server with
a regular hard drive or a small RAID array for redundancy, and stuffing
16 or 32 GB of RAM into it to ensure everything is cached. That's tried
and tested technology.
  
32GB is for one table only. This server runs other applications, and you 
need to leave space for sort memory, shared buffers etc. Buying 128GB 
memory would solve the problem, maybe... but it is too expensive. And it 
is not safe. Power out -> data loss.

I don't know how you came to the 32 GB figure, but keep in mind that
administration is a lot easier if you have plenty of extra disk space
for things like backups, dumps+restore, temporary files, upgrades etc.
  
This disk space would be dedicated for a smaller tablespace, holding one 
or two bigger tables with index scans. Of course I would never use an 
SSD disk for storing database backups. It would be waste of money.



 L


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-13 Thread Laszlo Nagy



Note that some RAID controllers (3Ware in particular) refuse to
recognize the MLC drives, in particular, they act as if the OCZ Vertex
series do not exist when connected.

I don't know what they're looking for (perhaps some indication that
actual rotation is happening?) but this is a potential problem make
sure your adapter can talk to these things!

BTW I have done some benchmarking with Postgresql against these drives
and they are SMOKING fast.
  
I was thinking about ARECA 1320 with 2GB memory + BBU. Unfortunately, I 
cannot find information about using ARECA cards with SSD drives. I'm 
also not sure how they would work together. I guess the RAID cards are 
optimized for conventional disks. They read/write data in bigger blocks 
and they optimize the order of reading/writing for physical cylinders. I 
know for sure that this particular areca card has an Intel dual core IO 
processor and its own embedded operating system. I guess it could be 
tuned for SSD drives, but I don't know how.


I was hoping that with a RAID 6 setup, write speed (which is slower for 
cheaper flash based SSD drives) would dramatically increase, because 
information written simultaneously to 10 drives. With very small block 
size, it would probably be true. But... what if the RAID card uses 
bigger block sizes, and - say - I want to update much smaller blocks in 
the database?


My other option is to buy two SLC SSD drives and use RAID1. It would 
cost about the same, but has less redundancy and less capacity. Which is 
the faster? 8-10 MLC disks in RAID 6 with a good caching controller, or 
two SLC disks in RAID1?


Thanks,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SSD + RAID

2009-11-13 Thread Laszlo Nagy

Hello,

I'm about to buy SSD drive(s) for a database. For decision making, I 
used this tech report:


http://techreport.com/articles.x/16255/9
http://techreport.com/articles.x/16255/10

Here are my concerns:

   * I need at least 32GB disk space. So DRAM based SSD is not a real
 option. I would have to buy 8x4GB memory, costs a fortune. And
 then it would still not have redundancy.
   * I could buy two X25-E drives and have 32GB disk space, and some
 redundancy. This would cost about $1600, not counting the RAID
 controller. It is on the edge.
   * I could also buy many cheaper MLC SSD drives. They cost about
 $140. So even with 10 drives, I'm at $1400. I could put them in
 RAID6, have much more disk space (256GB), high redundancy and
 POSSIBLY good read/write speed. Of course then I need to buy a
 good RAID controller.

My question is about the last option. Are there any good RAID cards that 
are optimized (or can be optimized) for SSD drives? Do any of you have 
experience in using many cheaper SSD drives? Is it a bad idea?


Thank you,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] random_page_cost for tablespace

2009-11-09 Thread Laszlo Nagy

Robert Haas írta:

2009/11/9 Laszlo Nagy :
  

We have a bigger table with some million rows. Number of index scans is
high, number of seq reads is low. This table if often joined with
others... so we want to buy a new SSD drive, create a tablespace on it
and put this big table on it. Random read speed on SSD is identical to
seq read. However, I need to tell the optimizer that random_page_cost is
less for the new tablespace. Is there a way to do it?



I happen to be working on a patch for this exact feature.  However,
even assuming it gets in, that means waiting for 8.5.
  

That will be a very nice feature. Thank you! :-)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] random_page_cost for tablespace

2009-11-09 Thread Laszlo Nagy


 Hi All,

We have a bigger table with some million rows. Number of index scans is
high, number of seq reads is low. This table if often joined with
others... so we want to buy a new SSD drive, create a tablespace on it
and put this big table on it. Random read speed on SSD is identical to
seq read. However, I need to tell the optimizer that random_page_cost is
less for the new tablespace. Is there a way to do it?

Thanks,

  Laszlo



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is my stats collector so busy?

2009-06-10 Thread Laszlo Nagy

Tom Lane wrote:

Laszlo Nagy  writes:
  
On a 8 processor system, my stats collector is always at 100% CPU. 



What platform?  What Postgres version?

regards, tom lane

  

8.3.5 on FreeBSD 7.0 amd64

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is my stats collector so busy?

2009-06-08 Thread Laszlo Nagy




What version of Postgres are you using?

  


8.3.5 on FreeBSD amd64


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why is my stats collector so busy?

2009-06-05 Thread Laszlo Nagy
On a 8 processor system, my stats collector is always at 100% CPU. 
Meanwhile disk I/O is very low. We have many databases, they are 
accessed frequently. Sometimes there are big table updates, but in most 
of the time only simple queries are ran against the databases, returning 
a few records only. From the maximum possible 8.0 system load, the 
average load is always above 1.1 and from this, 1.0 is the stats 
collector and 0.1 is the remaining of the system. If I restart the 
postgresql server, then the stats collector uses 0% CPU for about 10 
minutes, then goes up to 100% again. Is there a way to tell why it is 
working so much?


I asked this problem some months ago on a different mailing list. I was 
asked to provide tracebacks of the stats collector, but due to a bug in 
the FreeBSD ppid() function, I'm not able to trace the stats collector.


Thank you,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Partial index usage

2009-02-16 Thread Laszlo Nagy

Hi All,

I have these indexes on a table:


CREATE INDEX uidx_product_partno_producer_id
 ON product
 USING btree
 (partno, producer_id);


CREATE INDEX idx_product_partno
 ON product
 USING btree
 (partno);

Can I safely delete the second one? Will postgresql use 
(partno,producer_id) when it only needs to order by partno? (partno is a 
text field, producer_id is int4). Index sizes: 172MB and 137MB. I guess 
if I only had one index, it would save memory and increase performance.


Another pair of incides, 144MB and 81MB respecively:


CREATE INDEX idx_product_producer_uploads
 ON product
 USING btree
 (producer_id, am_upload_status_id);


CREATE INDEX idx_product_producer_id
 ON product
 USING btree
 (producer_id);


am_upload_status_id is also an int4. Can I delete the second index 
without performance drawback?


Thanks,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Big index sizes

2008-12-30 Thread Laszlo Nagy
We have serveral table where the index size is much bigger than the 
table size.


Example:

select count(*) from product_price -- 2234244

Table size: 400 MB
Index size: 600 MB

After executing "reindex table product_price", index size reduced to 269MB.

I believe this affects performance.

Vacuuming a table does not rebuild the indexes, am I right? I'm not sure 
if I need to do this manually, or is this the result of another problem? 
(For example, too many open transactions, frequent updates?)



Thanks


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow table update - SOLVED!

2008-12-29 Thread Laszlo Nagy




Inf 8.3 the HOT feature may help if the columns being updated are 
indexed ... what version of PostgreSQL is this again ? (Forgive my 
lack of memory -- the last few days I've forgotten a lot, heh heh.)



8.3.5.

The colum that was being updated is part of one small index only.



Any chances to reduce those to a bare minimum, perhaps using 
conditional index strategies or even some form of replication, so the 
primary uses indexes related to the updates and the mirror uses 
indexes related to the read-only / reporting needs ? Perhaps some form 
of staging table with no indexes to load, check data, etc. and then 
insert.


Any way to reduce those ? Check the usage via the system stats on 
table/index use and try removing some and testing to see what makes a 
difference.


We tried to remove all indexes on a test system and the update was 
speedy. We are going to try to reduce the row size also move static 
description/name/textual data into a separate table, and leave 
frequently updated data in the original one. We tested this theoretical 
version:


Query returned successfully: 182752 rows affected, 56885 ms execution time.

This is much faster. However, this table is used by hundreds of 
programs. Anyway, I got the answer to my question.


Thank you!

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow table update

2008-12-29 Thread Laszlo Nagy


My other idea was that there are so many indexes on this table, maybe 
the update is slow because of the indexes?



Updating indexes is certainly very far from being free.  How many is
"many"?
  

Number of indexes = 15.

3 indexex are on "text" type column, 500MB in size each.
Other are on int8 and timestamp columns, cca. 200MB each.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] rebellious pg stats collector (reopened case)

2008-12-22 Thread Laszlo Nagy

Posted to the wrong list by mistake. Sorry.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [ADMIN] rebellious pg stats collector (reopened case)

2008-12-22 Thread Laszlo Nagy



and see if its output changes when you start to trace it.
  

%cat test.c
#include 

int main() {
   while(1) {
   sleep(5);
   printf("ppid = %d\n", getppid());
   }
}

%gcc -o test test.c
%./test
ppid = 47653
ppid = 47653
ppid = 47653 # Started "truss -p 48864" here!
ppid = 49073
ppid = 49073
ppid = 49073


Agreed, but we need to understand what the tools being used to
investigate the problem are doing ...
  

Unfortunately, I'm not able to install strace:

# pwd
/usr/ports/devel/strace
# make
===>  strace-4.5.7 is only for i386, while you are running amd64.
*** Error code 1

Stop in /usr/ports/devel/strace.

I'll happily install any trace tool, but have no clue which one would help.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow table update

2008-12-22 Thread Laszlo Nagy
I just tested the same on a test machine. It only has one processor 1GB 
memory, and one SATA disk. The same "select count(*)" was 58 seconds. I 
started the same UPDATE with EXPLAIN ANALYZE. It is running since 1000 
seconds. I'm now 100% sure that the problem is with the database, 
because this machine has nothing but a postgresql server running on it. 
I'll post the output of explain analyze later.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow table update

2008-12-22 Thread Laszlo Nagy




If the table has some sort of FK relations it might be being slowed by 
the need to check a row meant to be deleted has any children.


If you look at my SQL, there is only one column to be updated. That 
column has no foreign key constraint. (It should have, but we did not 
want to add that constraint in order to speed up updates.)



Perhaps triggers ?


Table "product" has no triggers.



If the table is very bloated with lots of dead rows (but you did say 
you vacuum frequently and check the results to make sure they are 
effective?) that would slow it down.


I'm not sure how to check if the vacuum was effective. But we have 
max_fsm_pages=100 in postgresql.conf, and I do not get any errors 
from the daily vacuum script, so I presume that the table hasn't got too 
many dead rows.


Anyway, the table size is only 4GB. Even if half of the rows are dead, 
the update should run quite quickly. Another argument is that when I 
"select count(*)" instead of "UPDATE", then I get the result in 10 
seconds. I don't think that dead rows can make such a big difference 
between reading and writing.


My other idea was that there are so many indexes on this table, maybe 
the update is slow because of the indexes? The column being updated has 
only one index on it, and that is 200MB. But I have heard somewhere that 
because of PostgreSQL's multi version system, sometimes the system needs 
to update indexes with columns that are not being updated. I'm not sure. 
Might this be the problem?



A long running transaction elsewhere that is blocking the delete ? Did 
you check the locks ?


Sorry, this was an update. A blocking transaction would never explain 
why the disk I/O went up to 100% for 2600 seconds.


  L


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow table update

2008-12-22 Thread Laszlo Nagy

Laszlo Nagy wrote:

SQL:

update product set sz_category_id=null where am_style_kw1 is not null 
and sz_category_id is not null

Hmm, this query:

select count(*) from product where am_style_kw1 is not null and 
sz_category_id is not null and sz_category_id<>4809


opens in 10 seconds. The update would not finish in 2600 seconds. I 
don't understand.


L


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow table update

2008-12-22 Thread Laszlo Nagy

SQL:

update product set sz_category_id=null where am_style_kw1 is not null 
and sz_category_id is not null


query plan:

"Seq Scan on product  (cost=0.00..647053.30 rows=580224 width=1609)"
"  Filter: ((am_style_kw1 IS NOT NULL) AND (sz_category_id IS NOT NULL))"

Information on the table:

row count ~ 2 million
table size: 4841 MB
toast table size: 277mb
indexes size: 4434 MB

Computer: FreeBSD 7.0 stable, Dual Xeon Quad code 5420 2.5GHZ, 8GB 
memory, 6 ES SATA disks in hw RAID 6 (+2GB write back cache) for the 
database.


Autovacuum is enabled. We also perform "vacuum analyze" on the database, 
each day.


Here are some non-default values from postgresql.conf:

shared_buffers=400MB
maintenance_work_mem = 256MB
max_fsm_pages = 100

There was almost no load on the machine (CPU: mostly idle, IO: approx. 
5% total) when we started this update.


Maybe I'm wrong with this, but here is a quick calculation: the RAID 
array should do at least 100MB/sec. Reading the whole table should not 
take more than 1 min. I think about 20% of the rows should have been 
updated. Writting out all changes should not take too much time. I 
believe that this update should have been completed within 2-3 minutes.


In reality, after 2600 seconds I have cancelled the query. We monitored 
disk I/O and it was near 100% all the time.


What is wrong?

Thank you,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] UFS 2: soft updates vs. gjournal (AKA: Choosing a filesystem 2.)

2008-09-24 Thread Laszlo Nagy

Hi again,

Should I use gjournal on FreeBSD 7? Or just soft updates?

Here is my opinion: I suspect that gjournal would be much slower than
soft updates. Also gjournal is relatively new code, not very well
tested. But gjournal is better when the system crashes. Although I have
heard that sometimes gjournal will crash the system itself. There are
more pros for soft updates I would pefer that. But please let me
know if I'm wrong.

Thanks,

  Laszlo






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing a filesystem

2008-09-11 Thread Laszlo Nagy



going to the same drives. This turns your fast sequential I/O into
random I/O with the accompaning 10x or more performance decrease.

 
Unless you have a good RAID controller with battery-backed-up cache.
  

All right. :-) This is what I'll have:

Boxed Intel Server Board S5000PSLROMB with 8-port SAS ROMB card 
(Supports 45nm processors (Harpertown and Wolfdale-DP)
Intel® RAID Activation key AXXRAK18E enables full intelligent SAS RAID 
on S5000PAL, S5000PSL, SR4850HW4/M, SR6850HW4/M. RoHS Compliant.
512 MB 400MHz DDR2 ECC Registered CL3 DIMM Single Rank, x8(for 
s5000pslromb)
6-drive SAS/SATA backplane with expander (requires 2 SAS ports) for 
SC5400 and SC5299 (two pieces)

5410 Xeon 2.33 GHz/1333 FSB/12MB Dobozos , Passive cooling / 80W (2 pieces)
2048 MB 667MHz DDR2 ECC Fully Buffered CL5 DIMM Dual Rank, x8 (8 pieces)

SAS disks will be:  146.8 GB, SAS 3G,15000RPM, 16 MB cache (two pieces)
SATA disks will be: HDD Server SEAGATE Barracuda ES 7200.1 
(320GB,16MB,SATA II-300) __(10 pieces)


I cannot spend more money on this computer, but since you are all 
talking about battery back up, I'll try to get money from the management 
and buy this:


Intel® RAID Smart Battery AXXRSBBU3, optional battery back up for use 
with AXXRAK18E and SRCSAS144E.  RoHS Complaint.



This server will also be an IMAP server, web server etc. so I'm 100% 
sure that the SAS disks will be used for logging. I have two spare 200GB 
SATA disks here in the office but they are cheap ones designed for 
desktop computers. Is it okay to dedicate these disks for the WAL file 
in RAID1? Will it improve performance? How much trouble would it cause 
if the WAL file goes wrong? Should I just put the WAL file on the RAID 
1+0 array?


Thanks,

 Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Choosing a filesystem

2008-09-11 Thread Laszlo Nagy
I'm about to buy a new server. It will be a Xeon system with two 
processors (4 cores per processor) and  16GB RAM. Two RAID extenders 
will be attached to an Intel s5000 series motherboard, providing 12 
SAS/Serial ATA connectors.


The server will run FreeBSD 7.0, PostgreSQL 8, apache, PHP, mail server, 
dovecot IMAP server and background programs for database maintenance. On 
our current system, I/O performance for PostgreSQL is the biggest 
problem, but sometimes all CPUs are at 100%. Number of users using this 
system:


PostgreSQL:  30 connections
Apache: 30 connections
IMAP server: 15 connections

The databases are mostly OLTP, but the background programs are creating 
historical data and statistic data continuously, and sometimes web site 
visitors/serach engine robots run searches in bigger tables (with 
3million+ records).


There is an expert at the company who sells the server, and he 
recommended that I use SAS disks for the base system at least. I would 
like to use many SAS disks, but they are just too expensive. So the 
basic system will reside on a RAID 1 array, created from two SAS disks 
spinning at 15 000 rpm. I will buy 10 pieces of Seagate Barracuda 320GB 
SATA (ES 7200) disks for the rest.


The expert told me to use RAID 5 but I'm hesitating. I think that RAID 
1+0 would be much faster, and I/O performance is what I really need.


I would like to put the WAL file on the SAS disks to improve 
performance, and create one big RAID 1+0 disk for the data directory. 
But maybe I'm completely wrong. Can you please advise how to create 
logical partitions? The hardware is capable of handling different types 
of RAID volumes on the same set of disks. For example, a smaller RAID 0 
for indexes and a bigger RAID 5 etc.


If you need more information about the database, please ask. :-)

Thank you very much,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-28 Thread Laszlo Nagy



I guess you mean postgresql 8.3.1? :-)
  

Yep. Sorry.

 Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS
 2 + soft updates will be better, but I'm not sure. Which is better?



I'd stick with ufs2 atm. There are some issues with zfs which probably
have been ironed out by now but ufs2 has been deployed for a longer
time. Performance-wise they are about the same.
  
Thank you. I suspected the same but it was good to get positive 
confirmation.

 Question 4. How to make the partitions? This is the hardest question.
 Here is my plan:

 - the OS resides on 2 disks, RAID 1
 - the databases should go on 8 disks, RAID 0 + 1



If you have enough disks raid-6 should perform almost as good as raid
1+0. 
Hmm, I have heard that RAID 1 or RAID 1 + 0 should be used for 
databases,  never RAID 5. I know nothing about RAID 6. I guess I must 
accept your suggestion since you have more experience than I have. :-) 
Obviously, it would be easier to manage a single RAID 6 array.

I've setup 11 disks in raid-6 plus one hotspare so I can get more
space out of it. "Enough disks" are approx. eight and up.
  
The RAID controller that I have selected can only handle 8 disks. I 
guess I need to find a different one with 16 channels and use more 
disks. So are you saying that with all disks in a bigger RAID 6 array, I 
will get the most out of the hardware? In that case, I'll try to get a 
bit more money from the management and build RAID 6 with 12 disks.


I also feel that I need to use a separate RAID 1 array (I prefer 
gmirror) for the base system.


Thanks,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Planning a new server - help needed

2008-03-28 Thread Laszlo Nagy

Hello,

I need to install a new server for postgresql 8.3. It will run two 
databases, web server and some background programs. We already have a 
server but it is becoming slow and we would like to have something that 
is faster. It is a cost sensitive application, and I would like to get 
your opinion in some questions.


The database itself is an OLTP system. There are many smaller tables, 
and some bigger ones (biggest table with 1.2 million records, table size 
966MB, indexes size 790MB). In the bigger tables there are only a few 
records updated frequently, most of the other records are not changed. 
The smaller tables are updated continuously.


Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs 
say that it is better to use FreeBSD because it can alter the I/O 
priority of processes dynamically. The latest legacy release is 6.3 
which is probably more stable. However, folks say that 7.0 has superior 
performance on the same hardware. Can I use 7.0 on a production server?


Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller 
with 8 SATA 2 disks. The operating system would be on another disk pair, 
connected to the motherboard's controller. I wonder if I can get more 
performance with SCSI, for the same amount of money? (I can spend about 
$1500 on the controller and the disks, that would cover 10 SATA 2 disks 
and the controller.)


Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS 
2 + soft updates will be better, but I'm not sure. Which is better?


Question 4. How to make the partitions? This is the hardest question. 
Here is my plan:


- the OS resides on 2 disks, RAID 1
- the databases should go on 8 disks, RAID 0 + 1

However, the transaction log file should be on a separate disk and maybe 
I could gain more performance by putting indexes on a separate drive, 
but I do not want to reduce the number of disks in the RAID 0+1 array. 
Should I put indexes and transaction log on the RAID 1 array? Or should 
I invest a bit more money, add an SATA RAID controller with 16 channels 
and add more disks? Would it pay the bill? Another alternative is to put 
the biggest tables on a separate array so that it will be faster when we 
join these tables with other tables.


I know that it is hard to answer without knowing the structure of the 
databases. :-( I can make tests with different configurations later, but 
I would like to know your opinion first - what should I try?


Thanks,

  Laszlo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance on seq scan

2006-09-13 Thread Laszlo Nagy



I have had extremely bad performance historically with onboard SATA chipsets
on Linux.  The one exception has been with the Intel based chipsets (not the
CPU, the I/O chipset).
  
This board has Intel chipset. I cannot remember the exact type but it 
was not in the low end category.

dmesg says:


kernel: ad4: 152626MB  at ata2-master SATA150
kernel: ad4: 152627MB  at ata3-master SATA150


It is very likely that you are having problems with the driver for the
chipset.

Are you running RAID1 in hardware?  If so, turn it off and see what the
performance is.  The onboard hardware RAID is worse than useless, it
actually slows the I/O down.
  

I'm using software raid, namely gmirror:

GEOM_MIRROR: Device gm0 created (id=2574033628).
GEOM_MIRROR: Device gm0: provider ad4 detected.
GEOM_MIRROR: Device gm0: provider ad6 detected.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider ad6 activated.

#gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2574033628
Providers:
1. Name: mirror/gm0
  Mediasize: 160040803328 (149G)
  Sectorsize: 512
  Mode: r5w5e6
Consumers:
1. Name: ad4
  Mediasize: 160040803840 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 1153981856
2. Name: ad6
  Mediasize: 160041885696 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 3520427571


I tried to do:

#sysctl vfs.read_max=32
vfs.read_max: 6 -> 32

but I could not reach better disk read performance.

Thank you for your suggestions. Looks like I need to buy SCSI disks.

Regards,

  Laszlo


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Poor performance on seq scan

2006-09-12 Thread Laszlo Nagy

Craig A. James wrote:


There IS a bug for SATA disk drives in some versions of the Linux 
kernel.  On a lark I ran some of the I/O tests in this thread, and 
much to my surprise discovered my write speed was 6 MB/sec ... ouch!  
On an identical machine, different kernel, the write speed was 54 MB/sec.

My disks are running in SATA150 mode. Whatever it means.

I'm using FreeBSD, and not just because it dynamically alters the 
priority of long running  processes. :-)


 Laszlo


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Poor performance on seq scan

2006-09-12 Thread Laszlo Nagy

Tom Lane wrote:

Why is that showing 85+ percent *system* CPU time??  I could believe a
lot of idle CPU if the query is I/O bound, or a lot of user time if PG
was being a hog about doing the ~~ comparisons (not too unlikely BTW).
  
I'm sorry, this was really confusing. I don't know what it was - 
probably a background system process, started from cron (?). I retried 
the same query and I got this:


zeusd1=> explain analyze select id,name from product where name like 
'%Mug%';

QUERY PLAN

Seq Scan on product  (cost=0.00..206891.34 rows=36487 width=40) (actual 
time=17.188..44585.176 rows=91399 loops=1)

  Filter: (name ~~ '%Mug%'::text)
Total runtime: 44631.150 ms
(3 rows)

tty ad4  ad6 cpu
tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
0   62 115.25 143 16.06  116.03 143 16.17   3  0  9  3 85
  0   62 122.11 144 17.12  121.78 144 17.07   6  0  3  2 89
  0   62 126.18 158 19.45  125.86 157 19.28   5  0 11  6 79
  0   62 126.41 131 16.13  127.52 132 16.39   5  0  9  6 80
  0   62 127.80 159 19.81  126.89 158 19.55   5  0  9  0 86
  0   62 125.29 165 20.15  126.26 165 20.30   5  0 14  2 80
  0   62 127.22 164 20.32  126.74 165 20.37   5  0  9  0 86
  0   62 121.34 150 17.75  120.76 149 17.54   1  0 13  3 82
  0   62 121.40 143 16.92  120.33 144 16.89   5  0 11  3 82
  0   62 127.38 154 19.12  127.17 154 19.09   8  0  8  5 80
  0   62 126.88 129 15.95  127.00 128 15.84   5  0  9  5 82
  0   62 118.48 121 13.97  119.28 121 14.06   6  0 17  3 74
  0   62 127.23 146 18.10  126.79 146 18.04   9  0 20  2 70
  0   62 127.27 153 18.98  128.00 154 19.21   5  0 17  0 79
  0   62 127.02 130 16.09  126.28 130 16.00  10  0 16  3 70
  0   62 123.17 125 15.00  122.40 125 14.91   5  0 14  2 80
  0   62 112.37 130 14.24  112.62 130 14.27   0  0 14  3 83
  0   62 115.83 138 15.58  113.97 138 15.33   3  0 18  0 79

A bit better transfer rate, but nothing serious.

Regards,

  Laszlo



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] tsearch2 question (was: Poor performance on seq scan)

2006-09-12 Thread Laszlo Nagy

Tom Lane wrote:

Only if the index is capable of disgorging the original value of the
indexed column, a fact not in evidence in general (counterexample:
polygons indexed by their bounding boxes in an r-tree).  But yeah,
it's interesting to think about applying filters at the index fetch
step for index types that can hand back full values.  This has been
discussed before --- I think we had gotten as far as speculating about
doing joins with just index values.  See eg here:
http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php
A lot of the low-level concerns have already been dealt with in order to
support bitmap indexscans, but applying non-indexable conditions before
fetching from the heap is still not done.
  

To overcome this problem, I created a smaller "shadow" table:

CREATE TABLE product_search
(
 id int8 NOT NULL,
 name_desc text,
 CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
 select
   id,   
   name || ' ' || coalesce(description,'')

 from product;


Obviously, this is almost like an index, but I need to maintain it 
manually. I'm able to search with


zeusd1=> explain analyze select id from product_search where name_desc 
like '%Mug%';

  QUERY PLAN

Seq Scan on product_search  (cost=0.00..54693.34 rows=36487 width=8) 
(actual time=20.036..2541.971 rows=91399 loops=1)

  Filter: (name_desc ~~ '%Mug%'::text)
Total runtime: 2581.272 ms
(3 rows)

The total runtime remains below 3 sec in all cases. Of course I still 
need to join the main table to the result:


explain analyze select s.id,p.name from product_search s inner join 
product p on (p.id = s.id) where s.name_desc like '%Tiffany%'


 QUERY PLAN  

Nested Loop  (cost=0.00..55042.84 rows=58 width=40) (actual 
time=164.437..3982.610 rows=117 loops=1)
  ->  Seq Scan on product_search s  (cost=0.00..54693.34 rows=58 
width=8) (actual time=103.651..2717.914 rows=117 loops=1)

Filter: (name_desc ~~ '%Tiffany%'::text)
  ->  Index Scan using pk_product_id on product p  (cost=0.00..6.01 
rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117)

Index Cond: (p.id = "outer".id)
Total runtime: 4007.283 ms
(6 rows)

Took 4 seconds. Awesome! With the original table, it used to be one or 
two minutes!


Now you can ask, why am I not using tsearch2 for this? Here is answer:

CREATE TABLE product_search
(
 id int8 NOT NULL,
 ts_name_desc tsvector,
 CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
 select
   id,   
   to_tsvector(name || ' ' coalesce(description,''))

 from product;

CREATE INDEX idx_product_search_ts_name_desc  ON product_search  USING 
gist  (ts_name_desc);

VACUUM product_search;

zeusd1=> explain analyze select id from product_search where 
ts_name_desc @@ to_tsquery('mug');
  QUERY 
PLAN
--- 

Bitmap Heap Scan on product_search  (cost=25.19..3378.20 rows=912 
width=8) (actual time=954.669..13112.009 rows=91434 loops=1)

 Filter: (ts_name_desc @@ '''mug'''::tsquery)
 ->  Bitmap Index Scan on idx_product_search_ts_name_desc  
(cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455 
rows=91436 loops=1)

   Index Cond: (ts_name_desc @@ '''mug'''::tsquery)
Total runtime: 13155.724 ms
(5 rows)

zeusd1=> explain analyze select id from product_search where 
ts_name_desc @@ to_tsquery('tiffany');
   
QUERY PLAN   
 

Bitmap Heap Scan on product_search  (cost=25.19..3378.20 rows=912 
width=8) (actual time=13151.725..13639.112 rows=76 loops=1)

 Filter: (ts_name_desc @@ '''tiffani'''::tsquery)
 ->  Bitmap Index Scan on idx_product_search_ts_name_desc  
(cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705 
rows=81 loops=1)

   Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery)
Total runtime: 13639.478 ms
(5 rows)

At least 13 seconds, and the main table is not joined yet. Can anybody 
explain to me, why the seq scan is faster than the bitmap index? In the 
last example there were only 81 rows returned, but it took more than 13 
seconds. :(  Even if the whole table can be cached into memory (which 
isn't the case), the bitmap index should be much faster. Probably t

Re: [PERFORM] Poor performance on seq scan

2006-09-12 Thread Laszlo Nagy

Heikki Linnakangas wrote:


Is there any other columns besides id and name in the table? How big 
is products.txt compared to the heap file?
Yes, many other columns. The products.txt is only 59MB. It is similar to 
the size of the index size (66MB).


Another question: I have a btree index on product(name). It contains 
all product names and the identifiers of the products. Wouldn't it be 
easier to seq scan the index instead of seq scan the table? The index 
is only 66MB, the table is 1123MB.


Probably, but PostgreSQL doesn't know how to do that. Even if it did, 
it depends on how many matches there is. If you scan the index and 
then fetch the matching rows from the heap, you're doing random I/O to 
the heap. That becomes slower than scanning the heap sequentially if 
you're going to get more than a few hits.
I have 700 000 rows in the table, and usually there are less than 500 
hits. So probably using a "seq index scan" would be faster. :-) Now I 
also tried this:


create table test(id int8 not null primary key, name text);
insert into test select id,name from product;

And then:

zeusd1=> explain analyze select id,name from test where name like 
'%Tiffany%';

  QUERY PLAN
-
Seq Scan on test  (cost=0.00..26559.62 rows=79 width=40) (actual 
time=36.595..890.903 rows=117 loops=1)

  Filter: (name ~~ '%Tiffany%'::text)
Total runtime: 891.063 ms
(3 rows)

But this might be coming from the disk cache. Thank you for your 
comments. We are making progress.


  Laszlo


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor performance on seq scan

2006-09-12 Thread Laszlo Nagy

Luke Lonergan írta:
Lazlo, 

  

Meanwhile, "iostat 5" gives something like this:

 tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
   0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0



This is your problem.  Do the following and report the results here:

Take the number of GB of memory you have (say 2 for 2GB), multiply it by
25.  This is the number of 8KB pages you can fit in twice your ram.
Let's say you have 2GB - the result is 500,000.

Use that number to do the following test on your database directory:
  time bash -c "dd if=/dev/zero of=//bigfile bs=8k
count= && sync"
  
I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root 
of this fs is /usr.


time sh -c "dd if=/dev/zero of=/usr/test/bigfile bs=8k count=25 && 
sync "


25+0 records in
25+0 records out
204800 bytes transferred in 48.030627 secs (42639460 bytes/sec)
0.178u 8.912s 0:48.31 18.7% 9+96k 37+15701io 0pf+0w



Then do this:
  time bash -c "dd if=//bigfile of=/dev/null bs=8k"
  

time sh -c "dd if=/usr/test/bigfile of=/dev/null bs=8k"

25+0 records in
25+0 records out
204800 bytes transferred in 145.293473 secs (14095609 bytes/sec)
0.110u 5.857s 2:25.31 4.1%  10+99k 32923+0io 0pf+0w

At this point I thought there was another process reading doing I/O so I 
retried:


25+0 records in
25+0 records out
204800 bytes transferred in 116.395211 secs (17595226 bytes/sec)
0.137u 5.658s 1:56.51 4.9%  10+103k 29082+0io 0pf+1w

and again:

25+0 records in
25+0 records out
204800 bytes transferred in 120.198224 secs (17038521 bytes/sec)
0.063u 5.780s 2:00.21 4.8%  10+98k 29776+0io 0pf+0w

This is a mirrored disk with two SATA disks. In theory, writing should 
be slower than reading. Is this a hardware problem? Or is it that "sync" 
did not do the sync?


 Laszlo


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Poor performance on seq scan

2006-09-12 Thread Laszlo Nagy


 Hello,

I have a big table called products. Table size: 1123MB. Toast table 
size: 32MB. Indexes size: 380MB.

I try to do a query like this:

select id,name from products where name like '%Mug%';

Yes, I know that tsearch2 is better for this, but please read on. The 
above query gives this plan:


Seq Scan on product  (cost=0.00..153489.52 rows=31390 width=40)
 Filter: (name ~~ '%Mug%'::text)

When I use this with explain analyze:

"Seq Scan on product  (cost=0.00..153489.52 rows=31390 width=40) (actual 
time=878.873..38300.588 rows=72567 loops=1)"

"  Filter: (name ~~ '%Mug%'::text)"
"Total runtime: 38339.026 ms"

Meanwhile, "iostat 5" gives something like this:

tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
  1   14 128.00   1  0.10  128.00   1  0.10   5  0 94  1  0
  0   12 123.98 104 12.56  123.74 104 12.56   8  0 90  2  0
  0   12 125.66 128 15.75  125.26 128 15.68  10  0 85  6  0
  0   12 124.66 129 15.67  124.39 129 15.64  12  0 85  3  0
  0   12 117.13 121 13.87  117.95 121 13.96  12  0 84  5  0
  0   12 104.84 118 12.05  105.84 118 12.19  10  0 87  2  0

130 transfers per second with 12-15MB/sec transfer speed. (FreeBSD 6.1 
with two STATA150 drives in gmirror RAID1)


I made another test. I create a file with the identifiers and names of 
the products:


psql#\o products.txt
psql#select id,name from product;

Then I can search using grep:

grep "Mug" products.txt | cut -f1 -d\|

There is a huge difference. This command runs within 0.5 seconds. That 
is, at least 76 times faster than the seq scan. It is the same if I 
vacuum, backup and restore the database. I thought that the table is 
stored in one file, and the seq scan will be actually faster than 
grepping the file. Can you please tell me what am I doing wrong? I'm not 
sure if I can increase the performance of a seq scan by adjusting the 
values in postgresql.conf. I do not like the idea of exporting the 
product table periodically into a txt file, and search with grep. :-)


Another question: I have a btree index on product(name). It contains all 
product names and the identifiers of the products. Wouldn't it be easier 
to seq scan the index instead of seq scan the table? The index is only 
66MB, the table is 1123MB.


I'm new to this list and also I just recently started to tune postgresql 
so please forgive me if this is a dumb question.


Regards,

  Laszlo

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq